Post on 30-Dec-2015
description
transcript
Introduction to GRID ComputingBebo White bebo@slac.stanford.eduNew Directions in Information Technology Series
Contra Costa College
Fall 2005
DataGrid is a project funded by the European Union
Todays GoalsTo provide an introduction to key Grid computing and Web services issues, techniques, and technologiesTo provide a substantial background and vocabulary to support future studies in Grid computing and Web servicesTo describe some of the current applications of Grid computingTo describe some of the current Grid computing initiatives
DataGrid is a project funded by the European Union
Grid Hype
DataGrid is a project funded by the European Union
The Power Grid -On-Demand Access to ElectricityDecouple production & consumption, enablingOn-demand accessEconomies of scaleConsumer flexibilityNew devicesTimeQuality, economies of scale
DataGrid is a project funded by the European Union
The Shape of Grids to Come?
DataGrid is a project funded by the European Union
A Grid Checklist (#1)A system that coordinates resources that are not subject to centralized controlIntegrates and coordinates resources and users that live within different control domains for example, the users desktop vs. central computing; different administrative units of the same company; or different companies; and addresses the issues of security, policy, payment, membership, and so forth that arise in these settings.Otherwise we are dealing with a local management system(Ian Foster)
DataGrid is a project funded by the European Union
A Grid Checklist (#2)A system that uses standard, open, general-purpose protocols and interfacesIs built from multi-purpose protocols and interfaces that address such fundamental issues as authentication, authorization, resource discovery, and resource access.It is important that these protocols and interfaces be standard and open.Otherwise, we are dealing with an application-specific system.(Ian Foster)
DataGrid is a project funded by the European Union
A Grid Checklist (#3)A system that delivers nontrivial qualities of service.Allows its constituent resources to be used in a coordinated fashion to deliver various qualities of service, relating, for example, to response time, throughput, availability, and security, and/or co-allocation of multiple resource types to meet complex user demands, so that the utility of the combined system is significantly greater than the sum of its parts.(Ian Foster)
DataGrid is a project funded by the European Union
What is Grid Computing ?Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations [ I.Foster]A VO is a collection of users sharing similar needs and requirements in their access to processing, data and distributed resources and pursuing similar goals. Key concept :Ability to negotiate resource-sharing arrangements among a set of participating parties (providers and consumers) and then to use the resulting resource pool for some purpose [I.Foster]
DataGrid is a project funded by the European Union
The Grid ProblemFlexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resourceFrom The Anatomy of the Grid: Enabling Scalable Virtual OrganizationsEnable communities (virtual organizations) to share geographically distributed resources as they pursue common goals -- assuming the absence ofcentral location,central control, omniscience, existing trust relationships.
DataGrid is a project funded by the European Union
Elements of the ProblemResource sharingComputers, storage, sensors, networks, Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem solvingBeyond client-server: distributed data analysis, computation, collaboration, Dynamic, multi-institutional virtual orgsCommunity overlays on classic org structuresLarge or small, static or dynamic
DataGrid is a project funded by the European Union
The Grid Information ProblemThere is a need for different views of the information depending uponVO membershipSecurity constraintsIntended purposeEtc.
DataGrid is a project funded by the European Union
Why Grids ?Scale of the problems/applicationsSolving problems that are bigger than any one data center can holdSize of user communitiesLeading research in many different fields today require collaborations that span research centers and countries (i.e. multi-domain access to distributed resources) Need to provide access to large data processing power and huge data storage
DataGrid is a project funded by the European Union
What Kinds of Applications?Computation intensiveInteractive simulation (climate modeling)Large-scale simulation (galaxy formation, gravity waves, battlefield simulation)Engineering (parameter studies, linked models)Data intensiveExperimental data analysis (high energy physics)Image, sensor analysis (astronomy, climate)Distributed collaborationOnline instruments (microscopes, x-ray devices)Remote visualization (climate studies, biology)Engineering (structural testing, chemical)
DataGrid is a project funded by the European Union
Online Access to Scientific InstrumentsDOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicagotomographic reconstructionreal-timecollectionwide-areadisseminationdesktop & VR clients with shared controlsAdvanced Photon Sourcearchival storage
DataGrid is a project funded by the European Union
Mathematicians Solve NUG30Looking for the solution to the NUG30 quadratic assignment problem The problem involves assigning 30 facilities to 30 fixed locations so as to minimize the total cost of transferring material between the facilities. An informal collaboration of mathematicians and computer scientistsCondor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites)14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,13,26,17,30,6,20,19,8,18,7,27,12,11,23
MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin
DataGrid is a project funded by the European Union
Home Computers Evaluate AIDS DrugsCommunity =1000s of home computer usersPhilanthropic computing vendor (Entropia)Research group (Scripps)Common goal= advance AIDS research
DataGrid is a project funded by the European Union
Network for Earthquake Engineering Simulation NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each otherOn-demand access to experiments, data streams, computing, archives, collaborationNEESgrid: Argonne, Michigan, NCSA, UIUC, USC
DataGrid is a project funded by the European Union
The LHC DetectorsCMSATLASLHCb~6-8 PetaBytes / year~108 events/year~103 batch and interactive users Federico.carminati , EU review presentationHigh Energy Physics
DataGrid is a project funded by the European Union
Data Grids for High Energy PhysicsImage courtesy Harvey Newman, Caltech
DataGrid is a project funded by the European Union
Solving Large Problems Pre-GridMini ComputerMicrocomputerCluster(by Christophe Jacquet)Once upon a time..mainframe
DataGrid is a project funded by the European Union
The Grid Distributed Computing Idea (by Christophe Jacquet)and today
DataGrid is a project funded by the European Union
Differences Between Grids andDistributed ApplicationsHuge distributed applications already exist, but they tend to be specialized systems intended for a single purpose or user group e.g., SETI@Home, FightAIDS@HomeGrids go further and take into account:Different kinds of resourcesNot always the same hardware, data and applicationsNo parallelization requiredDifferent kinds of interactionsUser groups or applications want to interact with Grids in different waysDynamic natureResources and users added/removed/changed frequently
DataGrid is a project funded by the European Union
The Grid Vision
DataGrid is a project funded by the European Union
Broader ContextGrid Computing has much in common with major industrial thrustsBusiness-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet ComputingSharing issues not adequately addressed by existing technologies Complicated requirements: run program X at site Y subject to community policy P, providing access to data at Z according to policy QHigh performance: unique demands of advanced and high-performance systems
DataGrid is a project funded by the European Union
Grid Types - PhysicalCluster Grid Enterprise Grid Global Grid
DataGrid is a project funded by the European Union
Grid Types - LogicalData Grid responds to requests for computers and data stores; similar to (but more secure and auditable than) today's research gridsInformation Grid responds to requests for computational processes, that may require several data sources and processing stages to deliver a desired resultKnowledge Grid responds to high-level questions and finds the appropriate processes to deliver answers in the required form
DataGrid is a project funded by the European Union
The Classical (early) GridFocused on applications where data was stored in fileslittle support for transactions, relational database access or distributed query processingExploits a range of protocols such as: LDAP for directory services and file store queries,GridFTP for large-scale reliable data transferSSL for security
DataGrid is a project funded by the European Union
Why Now?Moores law improvements in computing produce highly functional end systemsThe Internet and burgeoning wired and wireless provide universal connectivityChanging modes of working and problem solving emphasize teamwork, computationNetwork exponentials produce dramatic changes in geometry and geography
DataGrid is a project funded by the European Union
Network ExponentialsNetwork vs. computer performanceComputer speed doubles every 18 monthsNetwork speed doubles every 9 monthsDifference = order of magnitude per 5 years1986 to 2000Computers: x 500Networks: x 340,0002001 to 2010Computers: x 60Networks: x 4000Moores Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
DataGrid is a project funded by the European Union
The 13.6 TF TeraGrid:Computing at 40 Gb/s262484HPSS5HPSSHPSSUniTreeExternal NetworksExternal NetworksExternal NetworksExternal NetworksSite ResourcesSite ResourcesSite ResourcesSite ResourcesNCSA/PACI8 TF240 TBSDSC4.1 TF225 TBCaltechArgonneTeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org
DataGrid is a project funded by the European Union
iVDGL:International Virtual Data Grid LaboratoryU.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org
DataGrid is a project funded by the European Union
Main Services of a Grid ArchitectureService providersPublish the availability of their services via information systemsSuch services may come-and-go or change dynamicallyE.g. a testbed site that offers x CPUs and y GB of storageService brokersRegister and categorize published services and provide search capabilitiesE.g. 1) SLAC Resource Broker selects the best site for a job 2) Catalogues of data held at each testbed siteService requestersSingle sign-on: log into the Grid onceUse brokering services to find a needed service and employ itE.g. CMS physicists submit a simulation job that needs 12 CPUs for 6 hours and 15 GB which gets scheduled, via the Resource Broker, on the CERN testbed site
DataGrid is a project funded by the European Union
Grid SecurityResource providers are essentially opening themselves up to itinerant usersSecure access to resources is requiredX.509 Public Key InfrastructureUsers identity has to be certified by (mutually recognized) national Certification Authorities (CAs)Resources (node machines) have to be certified by CAsTemporary delegation from users to processes to be executed in users name ( proxy certificates )Common agreed policies for accessing resource and handling users rights across different domains within VOs
DataGrid is a project funded by the European Union
The Globus ProjectMaking Grid computing a realityClose collaboration with real Grid projects in science and industryDevelopment and promotion of standard Grid protocols to enable interoperability and shared infrastructureDevelopment and promotion of standard Grid software APIs and SDKs to enable portability and code sharingThe Globus Toolkit: Open source, reference software base for building grid infrastructure and applicationsGlobal Grid Forum: Development of standard protocols and APIs for Grid computing
DataGrid is a project funded by the European Union
Selected Major Grid ProjectsNewNew
NameURL & SponsorsFocusAccess Gridwww.mcs.anl.gov/FL/ accessgrid; DOE, NSFCreate & deploy group collaboration systems using commodity technologies BlueGridIBMGrid testbed linking IBM laboratoriesDISCOMwww.cs.sandia.gov/ discom DOE Defense ProgramsCreate operational Grid providing access to resources at three U.S. DOE weapons laboratoriesDOE Science Gridsciencegrid.orgDOE Office of ScienceCreate operational Grid providing access to resources & applications at U.S. DOE science laboratories & partner universitiesEarth System Grid (ESG)earthsystemgrid.org DOE Office of ScienceDelivery and analysis of large climate model datasets for the climate research communityEuropean Union (EU) DataGrid eu-datagrid.orgEuropean UnionCreate & apply an operational grid for applications in high energy physics, environmental science, bioinformatics
DataGrid is a project funded by the European Union
Selected Major Grid ProjectsNewNewNewNewNew
NameURL/SponsorFocusEuroGrid, Grid Interoperability (GRIP)eurogrid.orgEuropean UnionCreate tech for remote access to supercomp resources & simulation codes; in GRIP, integrate with Globus ToolkitFusion Collaboratoryfusiongrid.orgDOE Off. ScienceCreate a national computational collaboratory for fusion researchGlobus Projectglobus.orgDARPA, DOE, NSF, NASA, MsoftResearch on Grid technologies; development and support of Globus Toolkit; application and deploymentGridLabgridlab.orgEuropean UnionGrid technologies and applicationsGridPPgridpp.ac.ukU.K. eScienceCreate & apply an operational grid within the U.K. for particle physics researchGrid Research Integration Dev. & Support Centergrids-center.orgNSFIntegration, deployment, support of the NSF Middleware Infrastructure for research & education
DataGrid is a project funded by the European Union
Selected Major Grid ProjectsNewNew
NameURL/SponsorFocusGrid Application Dev. Softwarehipersoft.rice.edu/ grads; NSFResearch into program development technologies for Grid applicationsGrid Physics Networkgriphyn.orgNSFTechnology R&D for data analysis in physics expts: ATLAS, CMS, LIGO, SDSSInformation Power Gridipg.nasa.govNASACreate and apply a production Grid for aerosciences and other NASA missionsInternational Virtual Data Grid Laboratoryivdgl.orgNSFCreate international Data Grid to enable large-scale experimentation on Grid technologies & applicationsNetwork for Earthquake Eng. Simulation Gridneesgrid.orgNSFCreate and apply a production Grid for earthquake engineeringParticle Physics Data Gridppdg.netDOE ScienceCreate and apply production Grids for data analysis in high energy and nuclear physics experiments
DataGrid is a project funded by the European Union
Selected Major Grid ProjectsNewNewAlso many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS
See also www.gridforum.org
NameURL/SponsorFocusTeraGridteragrid.orgNSFU.S. science infrastructure linking four major resource sites at 40 Gb/s UK Grid Support Centergrid-support.ac.ukU.K. eScienceSupport center for Grid projects within the U.K.UnicoreBMBFTTechnologies for remote access to supercomputers
DataGrid is a project funded by the European Union
Where is Development of the Grid Going ?GridWebThe definition of WSRF means that Grid and Web communities can move forward on a common base
DataGrid is a project funded by the European Union
StandardsGrid and Web Services are mergingGrid is an aggressive use case of Web ServicesWSRF completes common infrastructureWeb Services standards landscape is in fluxUncertain status of security and policy standards continues to be a big source of concernGrid services standards landscape heating upAgreement, management, data access, Open source software important for adoption
DataGrid is a project funded by the European Union
Standards (cont)Open, standard protocolsEnable interoperabilityAvoid product/vendor lock-inEnable innovation/competition on end pointsEnable ubiquityIn Grid space, must address how toDescribe, discover, and access resourcesMonitor, manage, and coordinate, resourcesAccount and charge for resourcesFor many different types of resource
DataGrid is a project funded by the European Union
Standards (cont)SSL/TLS v1 (from OpenSSL) (IETF)LDAP v3 (from OpenLDAP) (IETF)X.509 Proxy Certificates (IETF)GridFTP v1.0 (GGF)WSDL 1.1, XML, SOAP (W3C)WS-Security (OASIS)OGSI v1.0 (GGF)And others on the road to standardizationWSRF (OASIS), DAIS (GGF), WS-Agreement (GGF), WSDL 2.0, WSDM, SAML, XACML
DataGrid is a project funded by the European Union
WSRF SpecificationsList is still changing, but basically includes..Core:WS-Resource Framework (WSRF)WS-ResourceProperties (WSRF-RP)WS-ResourceLifetime (WSRF-RL)WS-ServiceGroup (WSRF-SG)WS-Base Faults(WSRF-BF)Related:WS-NotificationsWS-Addressing
DataGrid is a project funded by the European Union
WSRFWSRF is a framework consisting of a number of specifications.WS-Resource Properties WS-Resource Lifetime WS-Service GroupsWS-NotificationWS-BaseFaultsWS-Renewable References (unpublished)
Other WS specifications such as:WS-Addressing
DataGrid is a project funded by the European Union
How WSRF Fits in With Other Standards, Specifications and Protocols.Internet protocolsWeb servicesWSRFGrid stuffGlobus (GRAM, MDS)WSDL, SOAPHTTP, TCP/IP
DataGrid is a project funded by the European Union
Describing Web ServicesWeb Services Description Language (WSDL) 2.0Status: W3C Last Call Working Draft http://www.w3.org/TR/wsdl WSDL is for describing Web ServicesDefines XML-based grammar for describing network services as a set of endpointsDescribes their methods, arguments, return values and how to useApproach: Service Oriented Architecture (SOA)Service-Provider:Develop a Web Service and publish its description as WSDLPublish a link to it in a Service-RegistryService-Consumer:Service discovery, i.e. find WSDL, e.g. via Service-RegistryUse endpoint definition (WSDL) to communicate with service
DataGrid is a project funded by the European Union
Web Services AddressingURIs (Uniform Resource Identifiers). Look like URLs:http://webservices.mysite.com/weather/us/WeatherServiceWhen you have a Web Service URI, you will usually need to give that URI to a programIf you typed a Web Service URI into your web browser, you would probably get an error message or some unintelligible codeSome services include a polite response page
DataGrid is a project funded by the European Union
Service-Oriented ArchitecturePublishEndpoint DefinitionRegistry:Service BrokerService ProviderService ConsumerDiscoveryBind
DataGrid is a project funded by the European Union
Web Services ArchitectureWSDL: Core element of the Web Service Architecture stack (Endpoint definition language)ListenerResponderWeb ServiceXML 1.0 + Namespaces(messaging)SOAP(messaging)XSD(service description)WSDL(service description)UDDI(service discovery)Simplified Web Service Stack (WS-I Basic Profile 1.0 compliant)WSDL
DataGrid is a project funded by the European Union
WSDL GoalsExtensibility with respect toNew Transport protocolsNew Encoding rulesAbstraction with respect toEndpoints and MessagesTHEN mapped onto n concrete transports and encodingsReuse with respect toDefinitions reuseable to create new definitions
DataGrid is a project funded by the European Union
Abstract Endpoint TypePossibly part of a WSDL specificationMessageOperation PortType (Abstract Endpoint Type)Set of message flows (operations) expected by a particular endpoint type - No details relating to transport or encoding or locationMessageMessageMessageMessageMessageMessageOne-way operationRequest-Response operationNotificationoperationSolicit-ResponseoperationAbstract EndpointType
(PortType)
DataGrid is a project funded by the European Union
Concrete Endpoint TypeBinding (Concrete Endpoint Type)Defines transport and encoding particulars for a portTypeConcrete Endpoint Type(Binding)Concrete Endpoint Type(Binding)Messagesfor operationMessagesfor operationMessagesfor operationURIURIURIPortTypePortTypeTransport & Encodingoperationoperationoperationoperation
DataGrid is a project funded by the European Union
Shift to Service DefinitionPort (Endpoint Instance)Network address of an endpoint and the binding it adheres toNote not necessarily an TCP port
ServiceA collection of related endpoint instances HostConcrete Endpoint Type(Binding)HostConcrete Endpoint Type(Binding)Endpoint Instance(Port)Endpoint Instance(Port)Service
DataGrid is a project funded by the European Union
Describing Web ServicesAll WSDL Elements belong to the WSDL namespace: http://schemas.xmlsoap.org/wsdl/Namespaces for WSDL BindingSOAP Binding: http://schemas.xmlsoap.org/wsdl/soap/HTTP GET and POST Binding: http://schemas.xmlsoap.org/wsdl/http/WSDL MIME binding: http://schemas.xmlsoap.org/wsdl/mime/More to come
DataGrid is a project funded by the European Union
State ManagementCore communication model of the Web (HTTP) is statelessApplication requires state when a user traverses the multiple endpoints of a Web application/service
DataGrid is a project funded by the European Union
Web Service: Stateless
DataGrid is a project funded by the European Union
Web Service: Stateful
DataGrid is a project funded by the European Union
Web Service Invocation - Stateful
DataGrid is a project funded by the European Union
Web Service + WSRF = Stateful Resources = WS-ResourceA stateful resource is something that exists even when you're not interacting with it.E.g. database backend serviceStateful resources have properties that define statethese properties are how you interact with themProperties have valuesAdd/remove/change properties and values dynamically WSRF Specification:a WS-Resource is the combination of a Web service and a stateful resource on which it acts.
DataGrid is a project funded by the European Union
WS-Resource Approach to StateTypical approach:Put the state in the Web service (thus making it stateful, which is generally regarded as a bad thing) WSRF approach:Store state in a separate entity called a resourceEach resource has a unique key, A Web service can have multiple resourcesTo connect to service: URI + WS-Addressing Std
DataGrid is a project funded by the European Union
WS-ResourcesWeb services often provide access to stateJob submissions, databases
A WS-Resource is standard way of representing that state.
In this tutorial, we will be using counter resources which are simple accumulators.
DataGrid is a project funded by the European Union
WS-ResourcesWSRF specifications provide:XML-based Resource PropertiesLifetime management (creation/destruction) of resourcesServicegroups, which group together WS-ResourcesNotification(for example of changes in resource properties)FaultsRenewable References
DataGrid is a project funded by the European Union
Examples of WS-ResourcesFiles on a file serverRows in a databaseJobs in a job submission systemAccounts in a bank
DataGrid is a project funded by the European Union
Session DesignSession Defines a context in which a user communicates with a Web Application in a defined time periodOne Session per user Assigns application state to multiple requests from one userDesign Decision / Rules of thumbUse a database to persist stateUUID to identify a session/userPhysical Design: Session identifier exchange Cookie, hidden variable, or encoded into the URL
DataGrid is a project funded by the European Union
TransactionsTransaction A unit of work that should either succeed or fail as a whole. A series of operations that behave corresponding to the ACID rules.Series: BEGIN_TRANSACTION, Op1, , OpN, COMMIT_TRANSACTIONACID Rules define Atomicity, Consistency, Isolation, and DurabilityCharacteristics regarding Web ApplicationsLong RunningNested
DataGrid is a project funded by the European Union
Atomicity And ConsistencyAtomicityTransaction executes exactly once and is atomicAll the work is done or none of itConsistencyTransaction preserves the consistency of dataTransforming one consistent state results in another consistent state of data
DataGrid is a project funded by the European Union
Isolation And DurabilityIsolationTransaction is a unit of isolationConcurrent transactions behave as though each was the only transaction running in the SystemDurabilityTransaction is a unit of recoveryIf a transaction commits, the system guarantees that its updates will persist, immediately after the commit.
DataGrid is a project funded by the European Union
Aspects of DSADriven by communication aspectsPerformance issuesProtocol overheadBandwidthQuality of ServiceDelaysProxy, Cache and MirrorsOther IssuesSecurity, availability, etc.Operational aspects
DataGrid is a project funded by the European Union
Simple Web Service ChainWeb Service WS 1 provides functionality using WS 2, WS 2 providesLike a chain: The weakest element influences the overall behaviorHops - Represents the number of network nodes involved from the source WS to the destination WS. Example shows 2 Hops, 4 Web Services
WS 1WS 2WS 3WS n
DataGrid is a project funded by the European Union
Considering ScalabilityScale Up: More power added to the machineScale Out: The application logic unit is cloned across a set of identical serversScale UpScale Out
DataGrid is a project funded by the European Union
Scale-Out and PartitionScale out Web Servers and scale up DatabaseScale Up DatabasePartition Database
DataGrid is a project funded by the European Union
Partition DatabaseFunctional Each functional area of a site gets its own databaseDedicated hardware to certain functionsClass of hardware per functionTables - Huge scale opportunity for large tablesSome modern database management systems provide special support for thisRead-only DatabasesData changes do not occur oftenUse of Replicated Databases
DataGrid is a project funded by the European Union
Dynamic WS DiscoveryWeb Service calls Web Service mediated by Broker (respectively P2P network)Criteria may be quality, context, price, etc.Requires classification system or metadataBroker could use UDDI automatically on requestP2P discovery by content-based routing (e.g. for WSDL)WS 1Broker / P2P-NetworkWS xWS y
DataGrid is a project funded by the European Union
Integrating EndpointsTypical ProblemsNo standard Way to expose FunctionalityIntegration is expensive and error-proneNot designed for Partnership ScenarioWhy?Semantic of content gets lost on its way to presentationNeed for Semantic
DataGrid is a project funded by the European Union
Integrating Application LogicGoal: Federating Web Applications (respectively their Logical Units)Globalize the Component-based ViewNext Generation Web Applications will work togetherExtend processes with external (potentially unknown) partners
DataGrid is a project funded by the European Union
Federation ApproachWeb ApplicationWeb ApplicationWeb ApplicationWeb ApplicationInternet
DataGrid is a project funded by the European Union
Federation ScenariosDistributed Computing / Web Services in use for:Mobile Virtual EnterpriseMarket-place, Supply Chain, Grid Computing (Grid of Web Services) Portals providing uniform Access to distributed Information SpacesExamples of Business Relationships: B2B: Business-to-BusinessB2C: Business-to-ConsumerC2C: Consumer-to-ConsumerB2A: Business-to-AdministrationA2C: Administration-to-ConsumerA2A: Administration-to-Administration
DataGrid is a project funded by the European Union
Accessing ObjectsSOAP Version 1.2 W3C Recommendation 24 June 2003Part 0- Tutorial: http://www.w3.org/TR/soap12-part0/Part1: Defines Messaging FrameworkPart2: Adjuncts (may be used in messages)SOAP provides a simple and lightweight Mechanism for exchanging structured and typed Information between Peers in a decentralized, distributed EnvironmentFormerly known as Simple Object Access ProtocolDoes not itself define any Application Semantics, e.g. Programming Model
DataGrid is a project funded by the European Union
SOAPSOAP consists of three Parts:SOAP envelope - Defines what is in a message; who should deal with it, and whether it is optional or mandatorySOAP encoding rules - Define a serialization mechanism for application-defined data types. SOAP RPC representation - Define a convention that can be used to represent remote procedure calls and responses.
DataGrid is a project funded by the European Union
General Web Service ModelConsumerWeb Service(Provider)TransportProcess-LogicSOAPMessageRequestorParserListenerRespondere.g. HTTP(S), SMTP, FTP)Message
DataGrid is a project funded by the European Union
SOAP MessageSOAP ProtocolLayeringSOAPApplication Protocol(HTTP, SMTP, etc.)Transport Protocol(TCP/IP, IPX/SPX, etc.)Physical Protocol(Ethernet, ATM, etc.)
DataGrid is a project funded by the European Union
SOAP and Client/ServerIn order for SOAP to work, the client must have code running that is responsible for building the SOAP request. In response, a server must also be responsible for understanding the SOAP request, invoke the specified method, build the response message, and return it to the client.These details are up to you: your Web application
DataGrid is a project funded by the European Union
The HTTP AspectA SOAP request via HTTP POST requestsPOST /WebCalculator/Calculator.asmx HTTP/1.1Content-Type: text/xml...SOAPAction: http://tempuri.org/AddContent-Length: 386
...
DataGrid is a project funded by the European Union
Message StructureSOAP MessageSOAP EnvelopeSOAP HeaderSOAP BodyMessage Name and DataHeadersHeadersXML-encoded SOAP message name and data contains SOAP message nameIndividual headers encloses headers encloses payloadProtocol binding headersThe complete SOAP message
DataGrid is a project funded by the European Union
SOAP Message ExampleAn XML document using the SOAP schema:
... 12 10
DataGrid is a project funded by the European Union
Encoding Complex DataData structures are serialized as XML:
Plastic Novelties Ltd 129 PLAS
DataGrid is a project funded by the European Union
Example of a SOAP RequestSOAP message over HTTP-POST:POST /StockQuote HTTP/1.1Host: www.stockquoteserver.comContent-Type: text/xml; charset="utf-8"Content-Length: nnnnSOAPAction: "Some-URI
DIS
DataGrid is a project funded by the European Union
A SOAP ResponseSOAP response over HTTPHTTP/1.1 200 OKContent-Type: text/xml; charset="utf-8"Content-Length: nnnn
34.5
DataGrid is a project funded by the European Union
Example of a SOAP ErrorSOAP response over HTTPHTTP/1.1 500 Internal Server ErrorContent-Type: text/xml; charset="utf-8"Content-Length: nnnn
SOAP: MustUnderstand SOAP Must Under Error
DataGrid is a project funded by the European Union
Security and FeaturesIn context of HTTP builds on existing securityHTTPSX.509 certificatesDevelopers explicitly choose which methods to exposeExtensibility - the major strength of SOAPE.g. check the WS-* specifications http://msdn.microsoft.com/webservicesCf. WS-Security Roadmap
DataGrid is a project funded by the European Union
WS-Security RoadmapSecuritySecurityPolicySecureConversationTrustFederationPrivacyAuthorizationSOAP Messaging
DataGrid is a project funded by the European Union
Discovering Web ServicesUniversal Description, Discovery, and Integration (UDDI) Specifies what the API for a Web-based Registry looks like.All about the Yellow, White & Green PagesDefines how to run and operate Registry Sites on the WebDefines how to pay for its Operation encourages basic lookup services for freeFurther Information at http://uddi.org
DataGrid is a project funded by the European Union
Registry OperationPeer nodes (websites)Companies register with any nodeRegistrations replicated on a daily basisComplete set of registered records available at all nodesCommon set of SOAP APIs supported by all nodesCompliance enforced by business contractAribaMicrosoftotherUDDI.orgqueriesIBM
DataGrid is a project funded by the European Union
Why a DNS-like Model?Enforces cross-platform compatibility across competitor platformsDemonstration of trust and opennessAvoids tacit endorsement of any one vendors platformMay migrate to a third party
DataGrid is a project funded by the European Union
UDDI provides informationWho Business InformationWhat Find the right Type of BusinessWhere To Access a ServiceHow Describes how a given Interface functionsInformation provided at http://uddi.microsoft.com
DataGrid is a project funded by the European Union
UDDI A Publisher View
DataGrid is a project funded by the European Union
UDDI and Web ServicesDiscovery
Let me talk to you (SOAP)
How do we talk? (WSDL)
Find a Servicereturn service response (XML)http://yourservice.com/svc1return service descriptions (XML)http://yourservice.com/?WSDLHTML with link to WSDLhttp://yourservice.comhttp://www.uddi.orgLink to discovery documentWebService ConsumerWebService ProviderUDDI
DataGrid is a project funded by the European Union
UDDI and SOAPUser UDDI SOAP RequestUDDI SOAP ResponseUDDI Registry Node HTTP ServerSOAP ProcessorUDDI Registry ServiceB2B DirectoryCreate, View, Update, and Delete registrationsImplementation- neutral
DataGrid is a project funded by the European Union
Registry APIs (SOAP)Inquiry APIFind thingsfind_businessfind_servicefind_bindingfind_tModelGet Details about thingsget_businessDetailget_serviceDetailget_bindingDetailget_tModelDetailPublishers APISave thingssave_businesssave_servicesave_bindingsave_tModelDelete thingsdelete_businessdelete_servicedelete_bindingdelete_tModelsecurityget_authTokendiscard_authToken
DataGrid is a project funded by the European Union
Web Services Makes Sense For Grid ComputingClient requesting Grid ServiceSOAPMessageGrid ServiceProviderHTTP TransportVO BoundaryOr NetworkInterface inWDSL
DataGrid is a project funded by the European Union
Why Should HPC Folks Care About the Grid ? 1) Grid is a disruptive technology [Vision]It ushers in a virtualized, collaborative, distributed world that our applications will use2) Grid addresses pain points now [Reality]Grids are built not bought, and are delivering real benefitsThe computational demands of our applications are not going to get simpler 3) An open Grid is to our advantage [Future]Standards are being defined now that will determine the future of this technology
DataGrid is a project funded by the European Union
The Globus ProjectMaking Grid computing a realityClose collaboration with real Grid projects in science and industryDevelopment and promotion of standard Grid protocols to enable interoperability and shared infrastructureDevelopment and promotion of standard Grid software APIs and SDKs to enable portability and code sharingThe Globus Toolkit: Open source, reference software base for building grid infrastructure and applicationsGlobal Grid Forum: Development of standard protocols and APIs for Grid computing
DataGrid is a project funded by the European Union
Some Important DefinitionsResourceNetwork protocolNetwork enabled serviceApplication Programmer Interface (API)Software Development Kit (SDK)Syntax
Not discussed, but important: policies
DataGrid is a project funded by the European Union
ResourceAn entity that is to be sharedE.g., computers, storage, data, softwareDoes not have to be a physical entityE.g., Condor pool, distributed file system, Defined in terms of interfaces, not devicesE.g. scheduler such as LSF and PBS define a compute resourceOpen/close/read/write define access to a distributed file system, e.g. NFS, AFS, DFS
DataGrid is a project funded by the European Union
Network ProtocolA formal description of message formats and a set of rules for message exchangeRules may define sequence of message exchangesProtocol may define state-change in endpoint, e.g., file system state changeGood protocols designed to do one thingProtocols can be layeredExamples of protocolsIP, TCP, TLS (was SSL), HTTP, Kerberos
DataGrid is a project funded by the European Union
Network Enabled ServicesImplementation of a protocol that defines a set of capabilitiesProtocol defines interaction with serviceAll services require protocolsNot all protocols are used to provide services (e.g. IP, TLS)Examples: FTP and Web servers
DataGrid is a project funded by the European Union
Application Programming InterfaceA specification for a set of routines to facilitate application developmentRefers to definition, not implementationE.g., there are many implementations of MPI Spec often language-specific (or IDL)Routine name, number, order and type of arguments; mapping to language constructsBehavior or function of routineExamplesGSS API (security), MPI (message passing)
DataGrid is a project funded by the European Union
Software Development KitA particular instantiation of an APISDK consists of libraries and toolsProvides implementation of API specificationCan have multiple SDKs for an APIExamples of SDKsMPICH, Motif Widgets
DataGrid is a project funded by the European Union
SyntaxRules for encoding information, e.g.XML, Condor ClassAds, Globus RSLX.509 certificate format (RFC 2459)Cryptographic Message Syntax (RFC 2630)Distinct from protocolsOne syntax may be used by many protocols (e.g., XML); & useful for other purposesSyntaxes may be layeredE.g., Condor ClassAds -> XML -> ASCIIImportant to understand layerings when comparing or evaluating syntaxes
DataGrid is a project funded by the European Union
A Protocol can have Multiple APIsTCP/IP APIs include BSD sockets, Winsock, System V streams, The protocol provides interoperability: programs using different APIs can exchange informationI dont need to know remote users APITCP/IP Protocol: Reliable byte streamsWinSock APIBerkeley Sockets APIApplicationApplication
DataGrid is a project funded by the European Union
An API can have Multiple ProtocolsMPI provides portability: any correct program compiles & runs on a platformDoes not provide interoperability: all processes must link against same SDKE.g., MPICH and LAM versions of MPI
DataGrid is a project funded by the European Union
APIs and Protocols are Both ImportantStandard APIs/SDKs are importantThey enable application portabilityBut w/o standard protocols, interoperability is hard (every SDK speaks every protocol?)Standard protocols are importantEnable cross-site interoperabilityEnable shared infrastructureBut w/o standard APIs/SDKs, application portability is hard (different platforms access protocols in different ways)
DataGrid is a project funded by the European Union
Why Discuss Architecture?DescriptiveProvide a common vocabulary for use when describing Grid systemsGuidanceIdentify key areas in which services are required PrescriptiveDefine standard Intergrid protocols and APIs to facilitate creation of interoperable Grid systems and portable applications
DataGrid is a project funded by the European Union
One View of RequirementsIdentity & authenticationAuthorization & policyResource discoveryResource characterizationResource allocation(Co-)reservation, workflowDistributed algorithmsRemote data accessHigh-speed data transferPerformance guaranteesMonitoringAdaptationIntrusion detectionResource managementAccounting & paymentFault managementSystem evolutionEtc.Etc.
DataGrid is a project funded by the European Union
Another View: Three Obstaclesto Making Grid Computing RoutineNew approaches to problem solvingData Grids, distributed computing, peer-to-peer, collaboration grids, Structuring and writing programsAbstractions, toolsEnabling resource sharing across distinct institutionsResource discovery, access, reservation, allocation; authentication, authorization, policy; communication; fault detection and notification;
DataGrid is a project funded by the European Union
Programming & Systems ProblemsThe programming problemFacilitate development of sophisticated appsFacilitate code sharingRequires prog. envs: APIs, SDKs, toolsThe systems problemFacilitate coordinated use of diverse resourcesFacilitate infrastructure sharing: e.g., certificate authorities, info servicesRequires systems: protocols, servicesE.g., port/service/protocol for accessing information, allocating resources
DataGrid is a project funded by the European Union
The Systems Problem:Resource Sharing Mechanisms That Address security and policy concerns of resource owners and usersAre flexible enough to deal with many resource types and sharing modalitiesScale to large number of resources, many participants, many program componentsOperate efficiently when dealing with large amounts of data & computation
DataGrid is a project funded by the European Union
Aspects of the Systems ProblemNeed for interoperability when different groups want to share resourcesDiverse components, policies, mechanismsE.g., standard notions of identity, means of communication, resource descriptionsNeed for shared infrastructure services to avoid repeated development, installationE.g., one port/service/protocol for remote access to computing, not one per tool/applnE.g., Certificate Authorities: expensive to runA common need for protocols & services
DataGrid is a project funded by the European Union
A Protocol-Oriented View of Grid Architecture That Emphasizes Development of Grid protocols & servicesProtocol-mediated access to remote resourcesNew services: e.g., resource brokeringOn the Grid = speak Intergrid protocolsMostly (extensions to) existing protocolsDevelopment of Grid APIs & SDKsInterfaces to Grid protocols & servicesFacilitate application development by supplying higher-level abstractionsThe (hugely successful) model is the Internet
DataGrid is a project funded by the European Union
Layered Grid Architecture(By Analogy to Internet Architecture)
DataGrid is a project funded by the European Union
Protocols, Services, and APIs Occur at Each LevelLanguages/FrameworksFabric LayerApplicationsLocal Access APIs and ProtocolsCollective Service APIs and SDKsCollective ServicesCollective Service ProtocolsResource APIs and SDKsResource ServicesResource Service ProtocolsConnectivity APIsConnectivity Protocols
DataGrid is a project funded by the European Union
Important PointsBuilt on Internet protocols & servicesCommunication, routing, name resolution, etc.Layering here is conceptual, does not imply constraints on who can call whatProtocols/services/APIs/SDKs will, ideally, be largely self-containedSome things are fundamental: e.g., communication and securityBut, advantageous for higher-level functions to use common lower-level functions
DataGrid is a project funded by the European Union
The Hourglass ModelFocus on architecture issuesPropose set of core services as basic infrastructureUse to construct high-level, domain-specific solutionsDesign principlesKeep participation cost lowEnable local controlSupport for adaptationIP hourglass modelDiverse global servicesCoreservicesLocal OSA p p l i c a t i o n s
DataGrid is a project funded by the European Union
Where Are We With Architecture?No official standards existBut: Globus Toolkit has emerged as the de facto standard for several important Connectivity, Resource, and Collective protocolsGGF has an architecture working groupTechnical specifications are being developed for architecture elements: e.g., security, data, resource management, informationInternet drafts submitted in security area
DataGrid is a project funded by the European Union
Fabric LayerProtocols & ServicesJust what you would expect: the diverse mix of resources that may be sharedIndividual computers, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc.Few constraints on low-level technology: connectivity and resource level protocols form the neck in the hourglass Defined by interfaces not physical characteristics
DataGrid is a project funded by the European Union
Connectivity LayerProtocols & ServicesCommunicationInternet protocols: IP, DNS, routing, etc.Security: Grid Security Infrastructure (GSI)Uniform authentication, authorization, and message protection mechanisms in multi-institutional settingSingle sign-on, delegation, identity mappingPublic key technology, SSL, X.509, GSS-APISupporting infrastructure: Certificate Authorities, certificate & key management, GSI: www.gridforum.org/security
DataGrid is a project funded by the European Union
Resource LayerProtocols & ServicesGrid Resource Allocation Mgmt (GRAM) Remote allocation, reservation, monitoring, control of compute resourcesGridFTP protocol (FTP extensions)High-performance data access & transportGrid Resource Information Service (GRIS)Access to structure & state informationNetwork reservation, monitoring, controlAll built on connectivity layer: GSI & IPGridFTP: www.gridforum.orgGRAM, GRIS: www.globus.org
DataGrid is a project funded by the European Union
Collective LayerProtocols & ServicesIndex servers aka metadirectory servicesCustom views on dynamic resource collections assembled by a community Resource brokers (e.g., Condor Matchmaker)Resource discovery and allocationReplica catalogsReplication servicesCo-reservation and co-allocation servicesWorkflow management servicesEtc.
DataGrid is a project funded by the European Union
Example:High-ThroughputComputing SystemHigh Throughput Computing SystemDynamic checkpoint, job management, failover, stagingBrokering, certificate authorities Access to data, access to computers, access to network performance data Communication, service discovery (DNS), authentication, authorization, delegationStorage systems, schedulersCollective(App)AppCollective(Generic)ResourceConnectFabric
DataGrid is a project funded by the European Union
Example:Data Grid ArchitectureDiscipline-Specific Data Grid ApplicationCoherency control, replica selection, task management, virtual data catalog, virtual data code catalog, Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs, Access to data, access to computers, access to network performance data, Communication, service discovery (DNS), authentication, authorization, delegationStorage systems, clusters, networks, network caches, Collective(App)AppCollective(Generic)ResourceConnectFabric
DataGrid is a project funded by the European Union
The Programming ProblemBut how do I develop robust, secure, long-lived, well-performing applications for dynamic, heterogeneous Grids?I need, presumably:Abstractions and models to add to speed/robustness/etc. of developmentTools to ease application development and diagnose common problemsCode/tool sharing to allow reuse of code components developed by others
DataGrid is a project funded by the European Union
Grid Programming TechnologiesGrid applications are incredibly diverse (data, collaboration, computing, sensors, )Seems unlikely there is one solutionMost applications have been written from scratch, with or without Grid servicesApplication-specific libraries have been shown to provide significant benefitsNo new language, programming model, etc., has yet emerged that transforms thingsBut certainly still quite possible
DataGrid is a project funded by the European Union
Examples of GridProgramming TechnologiesMPICH-G2: Grid-enabled message passingCoG Kits, GridPort: Portal construction, based on N-tier architecturesGDMP, Data Grid Tools, SRB: replica management, collection managementCondor-G: workflow managementLegion: object models for Grid computingCactus: Grid-aware numerical solver frameworkNote tremendous variety, application focus
DataGrid is a project funded by the European Union
MPICH-G2: A Grid-Enabled MPIA complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environmentsBased on the Argonne MPICH implementation of MPI (Gropp and Lusk)Requires services for authentication, resource allocation, executable staging, output, etc.Programs run in wide area without changeSee also: MetaMPI, PACX, STAMPI, MAGPIEwww.globus.org/mpi
DataGrid is a project funded by the European Union
Cactus(Allen, Dramlitsch, Seidel, Shalf, Radke)Modular, portable framework for parallel, multidimensional simulationsConstruct codes by linkingSmall core (flesh): mgmt servicesSelected modules (thorns): Numerical methods, grids & domain decomps, visualization and steering, etc.Custom linking/configuration toolsDeveloped for astrophysics, but not astrophysics-specificCactus fleshThornswww.cactuscode.org
DataGrid is a project funded by the European Union
High-Throughput Computingand CondorHigh-throughput computingCPU cycles/day (week, month, year?) under non-ideal circumstancesHow many times can I run simulation X in a month using all available machines?Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughput computing facilityEmphasis on policy management and reliability
DataGrid is a project funded by the European Union
Object-Based ApproachesGrid-enabled CORBANASA Lewis, Rutgers, ANL, othersCORBA wrappers for Grid protocolsSome initial successesLegionU.VirginiaObject models for Grid components (e.g., vault=storage, host=computer)
DataGrid is a project funded by the European Union
PortalsN-tier architectures enabling thin clients, with middle tiers using Grid functionsThin clients = Web browsersMiddle tier = e.g. Java Server Pages, with Java CoG Kit, GPDK, GridPort utilitiesBottom tier = various Grid resourcesNumerous applications and projects, e.g.Unicore, Gateway, Discover, Mississippi Computational Web Portal, NPACI Grid Port, Lattice Portal, Nimrod-G, Cactus, NASA IPG Launchpad, Grid Resource Broker,
DataGrid is a project funded by the European Union
Common Toolkit UnderneathEach of these programming environments should not have to implement the protocols and services from scratch!Rather, want to share common code thatImplements core functionalitySoftware Development Kits (SDKs) that can be used to construct a large variety of services and clientsStandard services that can be easily deployedIs robust, well-architected, self-consistentIs open source, with broad input
DataGrid is a project funded by the European Union
General ApproachDefine Grid protocols & APIsProtocol-mediated access to remote resourcesIntegrate and extend existing standardsOn the Grid = speak Intergrid protocolsDevelop a reference implementationClient and server SDKs, services, tools, etc.Grid-enable wide variety of toolsLearn through deployment and applications
DataGrid is a project funded by the European Union
Globus ToolkitA software toolkit addressing key technical problems in the development of Grid enabled tools, services, and applicationsOffer a modular bag of technologiesEnable incremental development of grid-enabled tools and applications Implement standard Grid protocols and APIsMake available under liberal open source licenseCurrent version is 4.0, commonly referred to as GT4
DataGrid is a project funded by the European Union
Key Concepts for GT4OGSA, WSRF, and GT4These are basic architecture components for GT4Open Grid Services Architecture (OGSA)Web Services: OGSA, WSRF, and GT4 are based on standard Web Services technologies such as SOAP and WSDL. Need to be familiar with the Web Services architecture and languages.The Web Services Resource Framework: WSRF is the core of GT4.
DataGrid is a project funded by the European Union
Key Concepts for GT4 (cont)The GT4 Architecture: Based on WS-Resources and Web Services, and grid computingJava & XML: to use GT4, you need to be able to program in Java, and to understand basic XML.
DataGrid is a project funded by the European Union
OGSA Key RequirementsInteroperability and Support for Dynamic and Heterogeneous EnvironmentsResource Sharing Across OrganizationsOptimizationQuality of Service (QoS) AssuranceJob ExecutionData ServicesSecurityAdministrative Cost ReductionScalabilityAvailabilityEase of Use and Extensibility
DataGrid is a project funded by the European Union
OGSA Defines Basic CapabilitiesInfrastructure ServicesExecution Management ServicesData ServicesResource Management ServicesSecurity ServicesSelf-Management ServicesInformation ServicesSecurity Considerations
DataGrid is a project funded by the European Union
OGSA, WSRF, and GT4
DataGrid is a project funded by the European Union
GT4 Roadmap
DataGrid is a project funded by the European Union
History and MotivationDo we want standard APIs?Eg. MPI (Message Passing Interface)But on the grid, we actually want standard wire protocolsThe API can be different on each system
DataGrid is a project funded by the European Union
History and Motivation (cont)Open Grid Services Infrastructure (OGSI)Global Grid Forum (GGF) standardIdentified a number of common building blocks used in grid protocolsInspecting state, creating and removing state, detecting changes in state, naming stateDefined standard ways to do these things, based on Web services (defined a thing called a Grid Service)
DataGrid is a project funded by the European Union
History and Motivation (cont)But thenRealized that this was useful for Web services in general, not just for the grid.Moved out of GGF, into OASISSplit the single OGSI specification into a number of other specifications called WSRF.
DataGrid is a project funded by the European Union
Globus ToolkitGrid infrastructure softwareFour key protocolsSecurity/Authentication (GSI)Resource Management/Scheduling (GRAM)Resource description (GRIS/GIIS)Data/File transfer (GASS, GridFTP)
DataGrid is a project funded by the European Union
Grid Security Infrastructure (GSI)
DataGrid is a project funded by the European Union
Security TerminologyAuthentication: Establishing identityAuthorization: Establishing rightsMessage protectionMessage integrityMessage confidentialityNon-repudiationDigital signatureAccountingCertificate Authority (CA)
DataGrid is a project funded by the European Union
Why Grid Security is HardResources being used may be valuable & the problems being solved sensitiveResources are often located in distinct administrative domainsEach resource has own policies & proceduresSet of resources used by a single computation may be large, dynamic, and unpredictableNot just client/server, requires delegationIt must be broadly available & applicableStandard, well-tested, well-understood protocols; integrated with wide variety of tools
DataGrid is a project funded by the European Union
GSI in ActionCreate Processes at A and B that Communicate & Access Files at CSite A(Kerberos) Site B (Unix)Site C(Kerberos)Computer
UserComputer
Storagesystem
DataGrid is a project funded by the European Union
Grid Security Requirements
DataGrid is a project funded by the European Union
Candidate StandardsKerberos 5Fails to meet requirements:Integration with various local security solutionsUser based trust modelTransport Layer Security (TLS/SSL)Fails to meet requirements:Single sign-onDelegation
DataGrid is a project funded by the European Union
Grid Security Infrastructure (GSI)Extensions to standard protocols & APIsStandards: SSL/TLS, X.509 & CA, GSS-APIExtensions for single sign-on and delegationGlobus Toolkit reference implementation of GSISSLeay/OpenSSL + GSS-API + SSO/delegationTools and services to interface to local securitySimple ACLs; SSLK5/PKINIT for access to K5, AFS; Tools for credential managementLogin, logout, etc.SmartcardsMyProxy: Web portal login and delegationK5cert: Automatic X.509 certificate creation
DataGrid is a project funded by the European Union
Review of Public Key CryptographyAsymmetric keysA private key is used to encrypt data.A public key can decrypt data encrypted with the private key.An X.509 certificate includesSomeones subject name (user ID)Their public keyA signature from a Certificate Authority (CA) that:Proves that the certificate came from the CA.Vouches for the subject nameVouches for the binding of the public key to the subject
DataGrid is a project funded by the European Union
Public Key Based AuthenticationUser sends certificate over the wire.Other end sends user a challenge string.User encodes the challenge string with private keyPossession of private key means you can authenticate as subject in certificatePublic key is used to decode the challenge.If you can decode it, you know the subjectTreat your private key carefully!!Private key is stored only in well-guarded places, and only in encrypted form
DataGrid is a project funded by the European Union
X.509 Proxy CertificateDefines how a short term, restricted credential can be created from a normal, long-term X.509 credentialA proxy certificate is a special type of X.509 certificate that is signed by the normal end entity cert, or by another proxySupports single sign-on & delegation through impersonationCurrently an IETF draft
DataGrid is a project funded by the European Union
User ProxiesMinimize exposure of users private keyA temporary, X.509 proxy credential for use by our computationsWe call this a user proxy certificateAllows process to act on behalf of userUser-signed user proxy cert stored in local fileCreated via grid-proxy-init commandProxys private key is not encryptedRely on file system security, proxy certificate file must be readable only by the owner
DataGrid is a project funded by the European Union
DelegationRemote creation of a user proxyResults in a new private key and X.509 proxy certificate, signed by the original keyAllows remote process to act on behalf of the userAvoids sending passwords or private keys across the network
DataGrid is a project funded by the European Union
Globus Security APIsGeneric Security Service (GSS) APIIETF standardProvides functions for authentication, delegation, message protectionDecoupled from any particular communication methodBut GSS-API is somewhat complicated, so we also provide the easier-to-use globus_gss_assist API.GSI-enabled SASL is also provided
DataGrid is a project funded by the European Union
ResultsGSI adopted by 100s of sites, 1000s of usersGlobus CA has issued >3000 certs (user & host), >1500 currently active; other CAs activeRollouts are currently underway all over:NSF Teragrid, NASA Information Power Grid, DOE Science Grid, European Data Grid, etc.Integrated in research & commercial appsGrADS testbed, Earth Systems Grid, European Data Grid, GriPhyN, NEESgrid, etc.Standardization begun in Global Grid Forum, IETF
DataGrid is a project funded by the European Union
GSI ApplicationsGlobus Toolkit uses GSI for authenticationMany Grid tools, directly or indirectly, e.g.Condor-G, SRB, MPICH-G2, Cactus, GDMP, Commercial and open source tools, e.g.ssh, ftp, cvs, OpenLDAP, OpenAFSSecureCRT (Win32 ssh client)And since we use standard X.509 certificates, they can also be used forWeb access, LDAP server access, etc.
DataGrid is a project funded by the European Union
Ongoing and Future GSI WorkProtection against compromised resourcesRestricted delegation, smartcardsStandardizationScalability in numbers of users & resources Credential managementOnline credential repositories (MyProxy)Account managementAuthorizationPolicy languagesCommunity authorization
DataGrid is a project funded by the European Union
Proxy Certificate Standards WorkInternet Public Key Infrastructure X.509 Proxy Certificate Profiledraft-ietf-pkix-proxy-01.txtDraft being considered by IETF PKIX working group, and by GGF GSI working groupDefines proxy certificate format, including restricted rights and delegation tracingDemonstrated a prototype of restricted proxies at HPDC (August 2001) as part of CAS demo
DataGrid is a project funded by the European Union
GSS-API Extensions Work4 years of GSS-API experience, while on the whole quite positive, has shed light on various deficiencies of GSS-APIGSS-API Extensionsdraft-ggf-gss-extensions-04.txtDraft being considered by GGF GSI working group. Not yet submitted to IETF.Defines extensions to the GSS-API to better support Grid security
DataGrid is a project funded by the European Union
GSS-API ExtensionsCredential export/importAllows delegated credentials to be externalizedUsed for checkpointing a serviceDelegation at any time, in either directionMore rich options on use of delegationRestricted delegation handlingAdd proxy restrictions to delegated credInspect auth cert for restrictionsAllow better mapping of GSS to TLSSupport TLS framing of messages
DataGrid is a project funded by the European Union
Community Authorization ServiceQuestion: How does a large community grant its users access to a large set of resources?Should minimize burden on both the users and resource providersCommunity Authorization Service (CAS)Community negotiates access to resourcesResource outsources fine-grain authorization to CASResource only knows about CAS user credentialCAS handles user registration, group membershipUser who wants access to resource asks CAS for a capability credentialRestricted proxy of the CAS user cred., checked by resource
DataGrid is a project funded by the European Union
Community Authorization(Prototype shown August 2001) User
DataGrid is a project funded by the European Union
Community Authorization ServiceCAS provides user community with information needed to authenticate resourcesSent with capability credential, used on connection with resourceResource identity (DN), CAThis allows new resources/users (and their CAs) to be made available to a community through the CAS without action on the other users/resources part
DataGrid is a project funded by the European Union
Authorization APIService providers need to perform authorization policy evaluation on:Local policiesPolicies contained in restricted proxiesWe are working on 2 API layers:Low level GAA-API implementation for evaluation of policiesHigh level, very simple authorization API that can easily be embedded into servicesStill in early prototyping stage
DataGrid is a project funded by the European Union
Passport Online CA & MyProxyRequiring users to manage their own certs and keys is annoying and error proneA solution: Leverage Passport global authentication to obtain a proxy credentialPassport providesGlobally unique user name (email address)Method of verifying ownership of the name (authentication)Re-issuance (e.g. forgotten password)Passport credentials can be presented to an online CA or credential repositoryCreates and issues new (restricted) proxy certificate to the user on demand
DataGrid is a project funded by the European Union
Other Future Security WorkEase-of-useImproved error message, online CA, etc.Improved online credential repositoriesSee MyProxy paper at HPDCSupport for multiple user credentialsMulti-factor authenticationSubordinate certificate authorities for domainsEase issuance of host certs for domainsIndependent Data Unit Support
DataGrid is a project funded by the European Union
Security SummaryGSI successfully addresses wide variety of Grid security issuesBroad acceptance, deployment, integration with toolsStandardization on-going in IETF & GGFOngoing R&D to address next set of issuesFor more information: www.globus.org/research/papers.htmlA Security Architecture for Computational GridsDesign and Deployment of a National-Scale Authentication Infrastructurewww.gridforum.org/security
DataGrid is a project funded by the European Union
Grid Resource Allocation Management (GRAM)
DataGrid is a project funded by the European Union
The ChallengeEnabling secure, controlled remote access to heterogeneous computational resources and management of remote computationAuthentication and authorizationResource discovery & characterizationReservation and allocationComputation monitoring and controlAddressed by new protocols & servicesGRAM protocol as a basic building blockResource brokering & co-allocation servicesGSI for security, MDS for discovery
DataGrid is a project funded by the European Union
Resource ManagementThe Grid Resource Allocation Management (GRAM) protocol and client API allows programs to be started on remote resources, despite local heterogeneityResource Specification Language (RSL) is used to communicate requirements A layered architecture allows application-specific resource brokers and co-allocators to be defined in terms of GRAM servicesIntegrated with Condor, PBS, MPICH-G2,
DataGrid is a project funded by the European Union
Resource Management ArchitectureGRAMGRAMGRAMLSFCondorNQEApplicationRSLSimple ground RSLInformation ServiceLocalresourcemanagersRSLspecializationGround RSLQueries& Info
DataGrid is a project funded by the European Union
Resource Specification LanguageCommon notation for exchange of information between componentsSyntax similar to MDS/LDAP filtersRSL provides two types of information:Resource requirements: Machine type, number of nodes, memory, etc.Job configuration: Directory, executable, args, environmentGlobus Toolkit provides an API/SDK for manipulating RSL
DataGrid is a project funded by the European Union
RSL SyntaxElementary form: parenthesis clauses(attribute op value [ value ] )Operators Supported: , != Some supported attributes:executable, arguments, environment, stdin, stdout, stderr, resourceManagerContact, resourceManagerNameUnknown attributes are passed through May be handled by subsequent tools
DataGrid is a project funded by the European Union
Constraints: &For example:& (count>=5) (count=64) (executable=myprog)Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours
DataGrid is a project funded by the European Union
Disjunction: |For example:& (executable=myprog) ( | (&(count=5)(memory>=64)) (&(count=10)(memory>=32)))Create 5 instances of myprog on a machine that has at least 64MB of memory, or 10 instances on a machine with at least 32MB of memory
DataGrid is a project funded by the European Union
GRAM Protocol EvolutionGRAM-1: Simple HTTP-based RPCJob requestReturns a job contact: Opaque string that can be passed between clients, for access to jobJob cancel, status, signalEvent notification (callbacks) for state changesPending, active, done, failed, suspendedGRAM-1.5 (U Wisconsin contribution)Add reliability improvementsOnce-and-only-once submissionRecoverable job manager serviceReliable termination detectionGRAM-2: Moving to Web Services (SOAP)
DataGrid is a project funded by the European Union
Globus Toolkit ImplementationGatekeeperSingle point of entryAuthenticates user, maps to local security environment, runs serviceIn essence, a secure inetd Job managerA gatekeeper serviceLayers on top of local resource management system (e.g., PBS, LSF, etc.)Handles remote interaction with the job
DataGrid is a project funded by the European Union
GRAM ComponentsGrid SecurityInfrastructureJob ManagerGRAM client API calls to request resource allocationand process creation.MDS client API callsto locate resourcesQuery current statusof resourceCreateRSL LibraryParseRequestAllocate &create processesProcessProcessProcessMonitor &controlSite boundaryClientMDS: Grid Index Info ServerGatekeeperMDS: Grid Resource Info ServerLocal Resource ManagerMDS client API callsto get resource infoGRAM client API statechange callbacks
DataGrid is a project funded by the European Union
Co-allocationSimultaneous allocation of a resource setHandled via optimistic co-allocation based on free nodes or queue predictionIn the future, advance reservations will also be supported (already in prototype)Globus APIs/SDKs support the co-allocation of specific multi-requestsUses a Globus component called the Dynamically Updated Request Online Co-allocator (DUROC)
DataGrid is a project funded by the European Union
Multirequest: +A multirequest allows us to specify multiple resource needs, for example+ (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2))Execute 5 instances of p1 on a machine with at least 64M of memoryExecute p2 on a machine with an ATM connectionMultirequests are central to co-allocation
DataGrid is a project funded by the European Union
A Co-allocation Multirequest+( & (resourceManagerContact= flash.isi.edu:754:/C=US//CN=flash.isi.edu-fork) (count=1) (label="subjob A") (executable= my_app1) ) ( & (resourceManagerContact= sp139.sdsc.edu:8711:/C=US//CN=sp097.sdsc.edu-lsf") (count=2) (label="subjob B") (executable=my_app2) )
DataGrid is a project funded by the European Union
Job Submission InterfacesGlobus Toolkit includes several command line programs for job submission globus-job-run: Interactive jobsglobus-job-submit: Batch/offline jobsglobusrun: Flexible scripting infrastructureOthers are building better interfacesGeneral purposeCondor-G, PBS, GRD, Hotpage, etcApplication specificECCE, Cactus, Web portals
DataGrid is a project funded by the European Union
globus-job-runFor running of interactive jobsAdditional functionality beyond rshEx: Run 2 process job w/ executable stagingglobus-job-run -: host np 2 s myprog arg1 arg2Ex: Run 5 processes across 2 hostsglobus-job-run \-: host1 np 2 s myprog.linux arg1 \-: host2 np 3 s myprog.aix arg2For list of arguments run:globus-job-run -help
DataGrid is a project funded by the European Union
globus-job-submitFor running of batch/offline jobsglobus-job-submitSubmit jobSame interface as globus-job-runReturns immediatelyglobus-job-statusCheck job statusglobus-job-cancelCancel jobglobus-job-get-outputGet job stdout/errglobus-job-cleanCleanup after job
DataGrid is a project funded by the European Union
globusrunFlexible job submission for scriptingUses an RSL string to specify job request Contains an embedded globus-gass-serverDefines GASS URL prefix in RSL substitution variable:(stdout=$(GLOBUSRUN_GASS_URL)/stdout)Supports both interactive and offline jobsComplex to useMust write RSL by handMust understand its esoteric featuresGenerally you should use globus-job-* commands instead
DataGrid is a project funded by the European Union
Resource Management APIsThe globus_gram_client API provides access to all of the core job submission and management capabilities, including callback capabilities for monitoring job status.The globus_rsl API provides convenience functions for manipulating and constructing RSL strings.The globus_gram_myjob allows multi-process jobs to self-organize and to communicate with each other.The globus_duroc_control and globus_duroc_runtime APIs provide access to multirequest (co-allocation) capabilities.
DataGrid is a project funded by the European Union
Advance Reservationand Other GeneralizationsGeneral-purpose Architecture for Reservation and Allocation (GARA)2nd generation resource management servicesBroadens GRAM on two axesGeneralize to support various resource typesCPU, storage, network, devices, etc.Advance reservation of resources, in addition to allocationCurrently a research prototype
DataGrid is a project funded by the European Union
GARA: The Big Picture
DataGrid is a project funded by the European Union
Grid Information ServicesSystem information is critical to operation of the grid and construction of applicationsWhat resources are available?Resource discoveryWhat is the state of the grid?Resource selectionHow to optimize resource use Application configuration and adaptation?We need a general information infrastructure to answer these questions
DataGrid is a project funded by the European Union
Examples of Useful InformationCharacteristics of a compute resourceIP address, software available, system administrator, networks connected to, OS version, loadCharacteristics of a networkBandwidth and latency, protocols, logical topologyCharacteristics of the Globus infrastructureHosts, resource managers
DataGrid is a project funded by the European Union
Grid Information: Facts of LifeInformation is always oldTime of flight, changing system stateNeed to provide quality metricsDistributed state hard to obtainComplexity of global snapshot Component will failScalability and overheadMany different usage scenariosHeterogeneous policy, different information organizations, etc.
DataGrid is a project funded by the European Union
Grid Information ServiceProvide access to static and dynamic information regarding system componentsA basis for configuration and adaptation in heterogeneous, dynamic environmentsRequirements and characteristicsUniform, flexible access to informationScalable, efficient access to dynamic dataAccess to multiple information sourcesDecentralized maintenance
DataGrid is a project funded by the European Union
Two Classes Of Information ServersResource Description ServicesSupplies information about a specific resource (e.g. Globus 1.1.3 GRIS).Aggregate Directory ServicesSupplies collection of information which was gathered from multiple GRIS servers (e.g. Globus 1.1.3 GIIS).Customized naming and indexing
DataGrid is a project funded by the European Union
Information ProtocolsGrid Resource Registration ProtocolSupport information/resource discoveryDesigned to support machine/network failureGrid Resource Inquiry ProtocolQuery resource description server for informationQuery aggregate server for informationLDAP V3.0 in Globus 1.1.3
DataGrid is a project funded by the European Union
GIS ArchitectureAACustomized Aggregate Directories RRRRStandard Resource Description ServicesRegistrationProtocolUsersEnquiryProtocol
DataGrid is a project funded by the European Union
Metacomputing Directory ServiceUse LDAP as Inquiry Access information in a distributed directoryDirectory represented by collection of LDAP serversEach server optimized for particular functionDirectory can be updated by: Information providers and toolsApplications (i.e., users)Backend tools which generate info on demandInformation dynamically available to tools and applications
DataGrid is a project funded by the European Union
Two Classes Of MDS ServersGrid Resource Information Service (GRIS)Supplies information about a specific resourceConfigurable to support multiple information providersLDAP as inquiry protocolGrid Index Information Service (GIIS)Supplies collection of information which was gathered from multiple GRIS serversSupports efficient queries against information which is spread across multiple GRIS serverLDAP as inquiry protocol
DataGrid is a project funded by the European Union
LDAP DetailsLightweight Directory Access ProtocolIETF StandardStripped down version of X.500 DAP protocolSupports distributed storage/access (referrals)Supports authentication and access controlDefines:Network protocol for accessing directory contentsInformation model defining form of information Namespace defining how information is referenced and organized
DataGrid is a project funded by the European Union
MDS ComponentsLDAP 3.0 Protocol EngineBased on OpenLDAP with custom backendIntegrated cachingInformation providersDelivers resource information to backendAPIs for accessing & updating MDS contentsC, Java, PERL (LDAP API, JNDI)Various tools for manipulating MDS contentsCommand line tools, Shell scripts & GUIs
DataGrid is a project funded by the European Union
GRIS/GIIS
DataGrid is a project funded by the European Union
Grid Resource Information ServiceServer which runs on each resourceGiven the resource DNS name, you can find the GRIS server (well known port = 2135)Provides resource specific informationMuch of this information may be dynamicLoad, process information, storage information, etc.GRIS gathers this information on demandWhite pages lookup of resource informationEx: How much memory does machine have?Yellow pages lookup of resource optionsEx: Which queues on machine allows large jobs?
DataGrid is a project funded by the European Union
Grid Index Information ServiceGIIS describes a class of serversGathers information from multiple GRIS serversEach GIIS is optimized for particular queriesEx1: Which Alliance machines are >16 process SGIs?Ex2: Which Alliance storage servers have >100Mbps bandwidth to host X?Akin to web search enginesOrganization GIISThe Globus Toolkit ships with one GIISCaches GRIS info with long update frequencyUseful for queries across an organization that rely on relatively static information (Ex1 above)Can be merged into GRIS
DataGrid is a project funded by the European Union
Information Services APIRFC 1823 defines an IETF draft standard client API for accessing LDAP databasesConnect to serverPose query which returns data structures contains sets of object classes and attributesFunctions to walk these data structuresGlobus does not provide an LDAP API. We recommend the use of OpenLDAP, an open source implementation of RFC 1823.
DataGrid is a project funded by the European Union
Searching an LDAP Directorygrid-info-search [options] filter [attributes]
Default grid-info-search options-h mds.globus.orgMDS server-p 389 MDS port-b o=Grid search start point-T 30 LDAP query timeout-s sub scope = subtree alternatives: base : lookup this entry one: lookup immediate children
DataGrid is a project funded by the European Union
Searching a GRIS Servergrid-info-host-search [options] filter [attributes]
Exactly like grid-info-search, except defaults:-h localhostGRIS server-p 2135 GRIS port
Example:grid-info-host-search h pitcairn dn=* dn
DataGrid is a project funded by the European Union
Data Grid ProblemEnable a geographically distributed community [of thousands] to pool their resources in order to perform sophisticated, computationally intensive analyses on Petabytes of dataNote that this problem:Is common to many areas of scienceOverlaps strongly with other Grid problems
DataGrid is a project funded by the European Union
Data Intensive Issues Include Harness [potentially large numbers of] data, storage, network resources located in distinct administrative domains Respect local and global policies governing what can be used for whatSchedule resources efficiently, again subject to local and global constraintsAchieve high performance, with respect to both speed and reliabilityCatalog software and virtual data
DataGrid is a project funded by the European Union
Data IntensiveComputing and GridsThe term Data Grid is often usedUnfortunate as it implies a distinct infrastructure, which it isnt; but easy to say Data-intensive computing shares numerous requirements with collaboration, instrumentation, computation, Security, resource mgt, info services, etc.Important to exploit commonalities as very unlikely that multiple infrastructures can be maintainedFortunately this seems easy to do!
DataGrid is a project funded by the European Union
Examples ofDesired Data Grid FunctionalityHigh-speed, reliable access to remote dataAutomated discovery of best copy of data Manage replication to improve performanceCo-schedule compute, storage, networkTransparency wrt delivered performanceEnforce access control on dataAllow representation of global resource allocation policies
DataGrid is a project funded by the European Union
A Model Architecture for Data GridsMetadata CatalogReplica CatalogTape LibraryDisk CacheAttribute SpecificationLogical Collection and Logical File NameDisk ArrayDisk CacheApplicationReplica SelectionMultiple LocationsNWSSelectedReplicaGridFTP Control ChannelPerformanceInformation &PredictionsReplica Location 1Replica Location 2Replica Location 3MDSGridFTP Data Channel
DataGrid is a project funded by the European Union
Globus Toolkit ComponentsTwo major Data Grid components:
1. Data Transport and AccessCommon protocolSecure, efficient, flexible, extensible data movementFamily of tools supporting this protocol
2. Replica Management ArchitectureSimple scheme for managing:multiple copies of filescollections of files
DataGrid is a project funded by the European Union
Motivation for a Common Data Access ProtocolExisting distributed data storage systemsDPSS, HPSS: focus on high-performance access, utilize parallel data transfer, stripingDFS: focus on high-volume usage, dataset replication, local cachingSRB: connects heterogeneous data collections, uniform client interface, metadata queriesProblemsIncompatible (and proprietary) protocolsEach require custom clientPartitions available data sets and storage devicesEach protocol has subset of desired functionality
DataGrid is a project funded by the European Union
A Common, Secure,Efficient Data Access ProtocolCommon, extensible transfer protocolCommon protocol means all can interoperateDecouple low-level data transfer mechanisms from the storage serviceAdvantages: New, specialized storage systems are automatically compatible with existing systemsExisting systems have richer data transfer functionalityInterface to many storage systemsHPSS, DPSS, file systemsPlan for SRB integration
DataGrid is a project funded by the European Union
Access/Transport Protocol RequirementsSuite of communication libraries and related tools that supportGSI, Kerberos securityThird-party transfersParameter set/negotiatePartial file accessReliability/restartLarge file supportData channel reuseAll based on a standard, widely deployed protocolIntegrated instrumentationLoggin/audit trailParallel transfersStriping (cf DPSS)Policy-based access controlServer-side computationProxies (firewall, load bal)
DataGrid is a project funded by the European Union
GridFTP and Grid Access to Secondary Storage (GASS)
DataGrid is a project funded by the European Union
GridFTPWhy FTP?Ubiquity enables interoperation with many commodity toolsAlready supports many desired features, easily extended to support othersWell understood and supportedWe use the term GridFTP to refer toTransfer protocol which meets requirementsFamily of tools which implement the protocolNote GridFTP > FTPNote that despite name, GridFTP is not restricted to file transfer!
DataGrid is a project funded by the European Union
GridFTP: Basic ApproachFTP protocol is defined by several IETF RFCsStart with most commonly used subsetStandard FTP: get/put etc., 3rd-party transferImplement standard but often unused featuresGSS binding, extended directory listing, simple restartExtend in various ways, while preserving interoperability with existing serversStriped/parallel data channels, partial file, automatic & manual TCP buffer setting, progress monitoring, extended restart
DataGrid is a project funded by the European Union
GridFTP Protocol SpecificationsExisting standardsRFC 949: File Transfer ProtocolRFC 2228: FTP Security ExtensionsRFC 2389: Feature Negotiation for the File Transfer ProtocolDraft: FTP ExtensionsNew draftsGridFTP: Protocol Extensions to FTP for the GridGrid Forum Data Working Group
DataGrid is a project funded by the European Union
GridFTP vs. WebDAVWebDAV extends http for remote data accessCombines control and data over single channelFTP splits control and dataSupports multiple, user selectable data channel protocolsAdvantage to split channelsThird party transfers handled cleanlyCan (cleanly) define new data channel protocolsE.g. parallel/striped transfer, automatic TCP buffer/window negotiation, non-TCP based protocols, etc.Amenable to high-performance proxiesE.g. For firewalls, load balancing, etc.
DataGrid is a project funded by the European Union
The GridFTP Family of ToolsPatches to existing FTP codeGSI-enabled versions of existing FTP client and server, for high-quality production codeCustom-developed librariesImplement full GridFTP protocol, targeting custom use, high-performanceCustom-developed toolsServers and clients with specialized functionality and performance
DataGrid is a project funded by the European Union
A Word on GASSThe Globus Toolkit provides services for file and executable staging and I/O redirection that work well with GRAM. This is known as Globus Access to Secondary Storage (GASS).GASS uses GSI-enabled HTTP as the protocol for data transfer, and a caching algorithm for copying data when necessary.The globus_gass, globus_gass_transfer, and globus_gass_cache APIs provide programmer access to these capabilities, which are already integrated with the GR