Introduction to GRID Computing

Post on 30-Dec-2015

  • Introduction to GRID ComputingBebo White bebo@slac.stanford.eduNew Directions in Information Technology Series

    Contra Costa College

    Fall 2005

  • Todays GoalsTo provide an introduction to key Grid computing and Web services issues, techniques, and technologiesTo provide a substantial background and vocabulary to support future studies in Grid computing and Web servicesTo describe some of the current applications of Grid computingTo describe some of the current Grid computing initiatives

  • Grid Hype

  • The Power Grid -On-Demand Access to ElectricityDecouple production & consumption, enablingOn-demand accessEconomies of scaleConsumer flexibilityNew devicesTimeQuality, economies of scale

  • The Shape of Grids to Come?

  • A Grid Checklist (#1)A system that coordinates resources that are not subject to centralized controlIntegrates and coordinates resources and users that live within different control domains for example, the users desktop vs. central computing; different administrative units of the same company; or different companies; and addresses the issues of security, policy, payment, membership, and so forth that arise in these settings.Otherwise we are dealing with a local management system(Ian Foster)

  • A Grid Checklist (#2)A system that uses standard, open, general-purpose protocols and interfacesIs built from multi-purpose protocols and interfaces that address such fundamental issues as authentication, authorization, resource discovery, and resource access.It is important that these protocols and interfaces be standard and open.Otherwise, we are dealing with an application-specific system.(Ian Foster)

  • A Grid Checklist (#3)A system that delivers nontrivial qualities of service.Allows its constituent resources to be used in a coordinated fashion to deliver various qualities of service, relating, for example, to response time, throughput, availability, and security, and/or co-allocation of multiple resource types to meet complex user demands, so that the utility of the combined system is significantly greater than the sum of its parts.(Ian Foster)

  • What is Grid Computing ?Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations [ I.Foster]A VO is a collection of users sharing similar needs and requirements in their access to processing, data and distributed resources and pursuing similar goals. Key concept :Ability to negotiate resource-sharing arrangements among a set of participating parties (providers and consumers) and then to use the resulting resource pool for some purpose [I.Foster]

  • The Grid ProblemFlexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resourceFrom The Anatomy of the Grid: Enabling Scalable Virtual OrganizationsEnable communities (virtual organizations) to share geographically distributed resources as they pursue common goals -- assuming the absence ofcentral location,central control, omniscience, existing trust relationships.

  • Elements of the ProblemResource sharingComputers, storage, sensors, networks, Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem solvingBeyond client-server: distributed data analysis, computation, collaboration, Dynamic, multi-institutional virtual orgsCommunity overlays on classic org structuresLarge or small, static or dynamic

  • The Grid Information ProblemThere is a need for different views of the information depending uponVO membershipSecurity constraintsIntended purposeEtc.

  • Why Grids ?Scale of the problems/applicationsSolving problems that are bigger than any one data center can holdSize of user communitiesLeading research in many different fields today require collaborations that span research centers and countries (i.e. multi-domain access to distributed resources) Need to provide access to large data processing power and huge data storage

  • What Kinds of Applications?Computation intensiveInteractive simulation (climate modeling)Large-scale simulation (galaxy formation, gravity waves, battlefield simulation)Engineering (parameter studies, linked models)Data intensiveExperimental data analysis (high energy physics)Image, sensor analysis (astronomy, climate)Distributed collaborationOnline instruments (microscopes, x-ray devices)Remote visualization (climate studies, biology)Engineering (structural testing, chemical)

  • Online Access to Scientific InstrumentsDOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicagotomographic reconstructionreal-timecollectionwide-areadisseminationdesktop & VR clients with shared controlsAdvanced Photon Sourcearchival storage

  • Mathematicians Solve NUG30Looking for the solution to the NUG30 quadratic assignment problem The problem involves assigning 30 facilities to 30 fixed locations so as to minimize the total cost of transferring material between the facilities. An informal collaboration of mathematicians and computer scientistsCondor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites)14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,13,26,17,30,6,20,19,8,18,7,27,12,11,23

    MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin

  • Home Computers Evaluate AIDS DrugsCommunity =1000s of home computer usersPhilanthropic computing vendor (Entropia)Research group (Scripps)Common goal= advance AIDS research

  • Network for Earthquake Engineering Simulation NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each otherOn-demand access to experiments, data streams, computing, archives, collaborationNEESgrid: Argonne, Michigan, NCSA, UIUC, USC

  • The LHC DetectorsCMSATLASLHCb~6-8 PetaBytes / year~108 events/year~103 batch and interactive users Federico.carminati , EU review presentationHigh Energy Physics

  • Data Grids for High Energy PhysicsImage courtesy Harvey Newman, Caltech

  • Solving Large Problems Pre-GridMini ComputerMicrocomputerCluster(by Christophe Jacquet)Once upon a time..mainframe

  • The Grid Distributed Computing Idea (by Christophe Jacquet)and today

  • Differences Between Grids andDistributed ApplicationsHuge distributed applications already exist, but they tend to be specialized systems intended for a single purpose or user group e.g., SETI@Home, FightAIDS@HomeGrids go further and take into account:Different kinds of resourcesNot always the same hardware, data and applicationsNo parallelization requiredDifferent kinds of interactionsUser groups or applications want to interact with Grids in different waysDynamic natureResources and users added/removed/changed frequently

  • The Grid Vision

  • Broader ContextGrid Computing has much in common with major industrial thrustsBusiness-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet ComputingSharing issues not adequately addressed by existing technologies Complicated requirements: run program X at site Y subject to community policy P, providing access to data at Z according to policy QHigh performance: unique demands of advanced and high-performance systems

  • Grid Types - PhysicalCluster Grid Enterprise Grid Global Grid

  • Grid Types - LogicalData Grid responds to requests for computers and data stores; similar to (but more secure and auditable than) today's research gridsInformation Grid responds to requests for computational processes, that may require several data sources and processing stages to deliver a desired resultKnowledge Grid responds to high-level questions and finds the appropriate processes to deliver answers in the required form

  • The Classical (early) GridFocused on applications where data was stored in fileslittle support for transactions, relational database access or distributed query processingExploits a range of protocols such as: LDAP for directory services and file store queries,GridFTP for large-scale reliable data transferSSL for security

  • Why Now?Moores law improvements in computing produce highly functional end systemsThe Internet and burgeoning wired and wireless provide universal connectivityChanging modes of working and problem solving emphasize teamwork, computationNetwork exponentials produce dramatic changes in geometry and geography

  • Network ExponentialsNetwork vs. computer performanceComputer speed doubles every 18 monthsNetwork speed doubles every 9 monthsDifference = order of magnitude per 5 years1986 to 2000Computers: x 500Networks: x 340,0002001 to 2010Computers: x 60Networks: x 4000Moores Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

  • The 13.6 TF TeraGrid:Computing at 40 Gb/s262484HPSS5HPSSHPSSUniTreeExternal NetworksExternal NetworksExternal NetworksExternal NetworksSite ResourcesSite ResourcesSite ResourcesSite ResourcesNCSA/PACI8 TF240 TBSDSC4.1 TF225 TBCaltechArgonneTeraGrid/DTF: NCSA, SDSC, Caltech, Argonne

  • iVDGL:International Virtual Data Grid LaboratoryU.S. PIs: Avery, Foster, Gardner, Newman, Szalay

  • Main Services of a Grid ArchitectureService providersPublish the availability of their services via information systemsSuch services may come-and-go or change dynamicallyE.g. a testbed site that offers x CPUs and y GB of storageService brokersRegister and categorize published services and provide search capabilitiesE.g. 1) SLAC Resource Broker selects the best site for a job 2) Catalogues of data held at each testbed siteService requestersSingle sign-on: log into the Grid onceUse brokering services to find a needed service and employ itE.g. CMS physicists submit a simulation job that needs 12 CPUs for 6 hours and 15 GB which gets scheduled, via the Resource Broker, on the CERN testbed site

  • Grid SecurityResource providers are essentially opening themselves up to itinerant usersSecure access to resources is requiredX.509 Public Key InfrastructureUsers identity has to be certified by (mutually recognized) national Certification Authorities (CAs)Resources (node machines) have to be certified by CAsTemporary delegation from users to processes to be executed in users name ( proxy certificates )Common agreed policies for accessing resource and handling users rights across different domains within VOs

  • The Globus ProjectMaking Grid computing a realityClose collaboration with real Grid projects in science and industryDevelopment and promotion of standard Grid protocols to enable interoperability and shared infrastructureDevelopment and promotion of standard Grid software APIs and SDKs to enable portability and code sharingThe Globus Toolkit: Open source, reference software base for building grid infrastructure and applicationsGlobal Grid Forum: Development of standard protocols and APIs for Grid computing

  • Selected Major Grid ProjectsNewNew

    DataGrid is a project funded by the European Union

    NameURL/SponsorFocusEuroGrid, Grid Interoperability (GRIP)eurogrid.orgEuropean UnionCreate tech for remote access to supercomp resources & simulation codes; in GRIP, integrate with Globus ToolkitFusion Collaboratoryfusiongrid.orgDOE Off. ScienceCreate a national computational collaboratory for fusion researchGlobus Projectglobus.orgDARPA, DOE, NSF, NASA, MsoftResearch on Grid technologies; development and support of Globus Toolkit; application and deploymentGridLabgridlab.orgEuropean UnionGrid technologies and eScienceCreate & apply an operational grid within the U.K. for particle physics researchGrid Research Integration Dev. & Support Centergrids-center.orgNSFIntegration, deployment, support of the NSF Middleware Infrastructure for research & education

  • Selected Major Grid ProjectsNewNew

    DataGrid is a project funded by the European Union

    See also

    NameURL/SponsorFocusTeraGridteragrid.orgNSFU.S. science infrastructure linking four major resource sites at 40 Gb/s UK Grid Support eScienceSupport center for Grid projects within the U.K.UnicoreBMBFTTechnologies for remote access to supercomputers

  • Where is Development of the Grid Going ?GridWebThe definition of WSRF means that Grid and Web communities can move forward on a common base

  • StandardsGrid and Web Services are mergingGrid is an aggressive use case of Web ServicesWSRF completes common infrastructureWeb Services standards landscape is in fluxUncertain status of security and policy standards continues to be a big source of concernGrid services standards landscape heating upAgreement, management, data access, Open source software important for adoption

  • Standards (cont)Open, standard protocolsEnable interoperabilityAvoid product/vendor lock-inEnable innovation/competition on end pointsEnable ubiquityIn Grid space, must address how toDescribe, discover, and access resourcesMonitor, manage, and coordinate, resourcesAccount and charge for resourcesFor many different types of resource

  • Standards (cont)SSL/TLS v1 (from OpenSSL) (IETF)LDAP v3 (from OpenLDAP) (IETF)X.509 Proxy Certificates (IETF)GridFTP v1.0 (GGF)WSDL 1.1, XML, SOAP (W3C)WS-Security (OASIS)OGSI v1.0 (GGF)And others on the road to standardizationWSRF (OASIS), DAIS (GGF), WS-Agreement (GGF), WSDL 2.0, WSDM, SAML, XACML

  • WSRF SpecificationsList is still changing, but basically includes..Core:WS-Resource Framework (WSRF)WS-ResourceProperties (WSRF-RP)WS-ResourceLifetime (WSRF-RL)WS-ServiceGroup (WSRF-SG)WS-Base Faults(WSRF-BF)Related:WS-NotificationsWS-Addressing

  • WSRFWSRF is a framework consisting of a number of specifications.WS-Resource Properties WS-Resource Lifetime WS-Service GroupsWS-NotificationWS-BaseFaultsWS-Renewable References (unpublished)

    Other WS specifications such as:WS-Addressing

  • How WSRF Fits in With Other Standards, Specifications and Protocols.Internet protocolsWeb servicesWSRFGrid stuffGlobus (GRAM, MDS)WSDL, SOAPHTTP, TCP/IP

  • Describing Web ServicesWeb Services Description Language (WSDL) 2.0Status: W3C Last Call Working Draft WSDL is for describing Web ServicesDefines XML-based grammar for describing network services as a set of endpointsDescribes their methods, arguments, return values and how to useApproach: Service Oriented Architecture (SOA)Service-Provider:Develop a Web Service and publish its description as WSDLPublish a link to it in a Service-RegistryService-Consumer:Service discovery, i.e. find WSDL, e.g. via Service-RegistryUse endpoint definition (WSDL) to communicate with service

  • Web Services AddressingURIs (Uniform Resource Identifiers). Look like URLs: you have a Web Service URI, you will usually need to give that URI to a programIf you typed a Web Service URI into your web browser, you would probably get an error message or some unintelligible codeSome services include a polite response page

  • Service-Oriented ArchitecturePublishEndpoint DefinitionRegistry:Service BrokerService ProviderService ConsumerDiscoveryBind

  • Web Services ArchitectureWSDL: Core element of the Web Service Architecture stack (Endpoint definition language)ListenerResponderWeb ServiceXML 1.0 + Namespaces(messaging)SOAP(messaging)XSD(service description)WSDL(service description)UDDI(service discovery)Simplified Web Service Stack (WS-I Basic Profile 1.0 compliant)WSDL

  • WSDL GoalsExtensibility with respect toNew Transport protocolsNew Encoding rulesAbstraction with respect toEndpoints and MessagesTHEN mapped onto n concrete transports and encodingsReuse with respect toDefinitions reuseable to create new definitions

  • Abstract Endpoint TypePossibly part of a WSDL specificationMessageOperation PortType (Abstract Endpoint Type)Set of message flows (operations) expected by a particular endpoint type - No details relating to transport or encoding or locationMessageMessageMessageMessageMessageMessageOne-way operationRequest-Response operationNotificationoperationSolicit-ResponseoperationAbstract EndpointType


  • Concrete Endpoint TypeBinding (Concrete Endpoint Type)Defines transport and encoding particulars for a portTypeConcrete Endpoint Type(Binding)Concrete Endpoint Type(Binding)Messagesfor operationMessagesfor operationMessagesfor operationURIURIURIPortTypePortTypeTransport & Encodingoperationoperationoperationoperation

  • Shift to Service DefinitionPort (Endpoint Instance)Network address of an endpoint and the binding it adheres toNote not necessarily an TCP port

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    A WS-Resource is standard way of representing that state.

    In this tutorial, we will be using counter resources which are simple accumulators.

  • WS-ResourcesWSRF specifications provide:XML-based Resource PropertiesLifetime management (creation/destruction) of resourcesServicegroups, which group together WS-ResourcesNotification(for example of changes in resource properties)FaultsRenewable References

  • Examples of WS-ResourcesFiles on a file serverRows in a databaseJobs in a job submission systemAccounts in a bank

  • Session DesignSession Defines a context in which a user communicates with a Web Application in a defined time periodOne Session per user Assigns application state to multiple requests from one userDesign Decision / Rules of thumbUse a database to persist stateUUID to identify a session/userPhysical Design: Session identifier exchange Cookie, hidden variable, or encoded into the URL

  • TransactionsTransaction A unit of work that should either succeed or fail as a whole. A series of operations that behave corresponding to the ACID rules.Series: BEGIN_TRANSACTION, Op1, , OpN, COMMIT_TRANSACTIONACID Rules define Atomicity, Consistency, Isolation, and DurabilityCharacteristics regarding Web ApplicationsLong RunningNested

  • Atomicity And ConsistencyAtomicityTransaction executes exactly once and is atomicAll the work is done or none of itConsistencyTransaction preserves the consistency of dataTransforming one consistent state results in another consistent state of data

  • Isolation And DurabilityIsolationTransaction is a unit of isolationConcurrent transactions behave as though each was the only transaction running in the SystemDurabilityTransaction is a unit of recoveryIf a transaction commits, the system guarantees that its updates will persist, immediately after the commit.

  • Aspects of DSADriven by communication aspectsPerformance issuesProtocol overheadBandwidthQuality of ServiceDelaysProxy, Cache and MirrorsOther IssuesSecurity, availability, etc.Operational aspects

  • Simple Web Service ChainWeb Service WS 1 provides functionality using WS 2, WS 2 providesLike a chain: The weakest element influences the overall behaviorHops - Represents the number of network nodes involved from the source WS to the destination WS. Example shows 2 Hops, 4 Web Services

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    ... 12 10

  • Encoding Complex DataData structures are serialized as XML:

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    SOAP: MustUnderstand SOAP Must Under Error

  • Security and FeaturesIn context of HTTP builds on existing securityHTTPSX.509 certificatesDevelopers explicitly choose which methods to exposeExtensibility - the major strength of SOAPE.g. check the WS-* specifications WS-Security Roadmap

  • WS-Security RoadmapSecuritySecurityPolicySecureConversationTrustFederationPrivacyAuthorizationSOAP Messaging

  • Discovering Web ServicesUniversal Description, Discovery, and Integration (UDDI) Specifies what the API for a Web-based Registry looks like.All about the Yellow, White & Green PagesDefines how to run and operate Registry Sites on the WebDefines how to pay for its Operation encourages basic lookup services for freeFurther Information at

  • Registry OperationPeer nodes (websites)Companies register with any nodeRegistrations replicated on a daily basisComplete set of registered records available at all nodesCommon set of SOAP APIs supported by all nodesCompliance enforced by business contractAribaMicrosoftotherUDDI.orgqueriesIBM

  • Why a DNS-like Model?Enforces cross-platform compatibility across competitor platformsDemonstration of trust and opennessAvoids tacit endorsement of any one vendors platformMay migrate to a third party

  • UDDI provides informationWho Business InformationWhat Find the right Type of BusinessWhere To Access a ServiceHow Describes how a given Interface functionsInformation provided at

  • UDDI A Publisher View

  • UDDI and Web ServicesDiscovery

    How do we talk? (WSDL)

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    DataGrid is a project funded by the European Union

    Not discussed, but important: policies

  • ResourceAn entity that is to be sharedE.g., computers, storage, data, softwareDoes not have to be a physical entityE.g., Condor pool, distributed file system, Defined in terms of interfaces, not devicesE.g. scheduler such as LSF and PBS define a compute resourceOpen/close/read/write define access to a distributed file system, e.g. NFS, AFS, DFS

  • Network ProtocolA formal description of message formats and a set of rules for message exchangeRules may define sequence of message exchangesProtocol may define state-change in endpoint, e.g., file system state changeGood protocols designed to do one thingProtocols can be layeredExamples of protocolsIP, TCP, TLS (was SSL), HTTP, Kerberos

  • Network Enabled ServicesImplementation of a protocol that defines a set of capabilitiesProtocol defines interaction with serviceAll services require protocolsNot all protocols are used to provide services (e.g. IP, TLS)Examples: FTP and Web servers

  • Application Programming InterfaceA specification for a set of routines to facilitate application developmentRefers to definition, not implementationE.g., there are many implementations of MPI Spec often language-specific (or IDL)Routine name, number, order and type of arguments; mapping to language constructsBehavior or function of routineExamplesGSS API (security), MPI (message passing)

  • Software Development KitA particular instantiation of an APISDK consists of libraries and toolsProvides implementation of API specificationCan have multiple SDKs for an APIExamples of SDKsMPICH, Motif Widgets

  • SyntaxRules for encoding information, e.g.XML, Condor ClassAds, Globus RSLX.509 certificate format (RFC 2459)Cryptographic Message Syntax (RFC 2630)Distinct from protocolsOne syntax may be used by many protocols (e.g., XML); & useful for other purposesSyntaxes may be layeredE.g., Condor ClassAds -> XML -> ASCIIImportant to understand layerings when comparing or evaluating syntaxes

  • A Protocol can have Multiple APIsTCP/IP APIs include BSD sockets, Winsock, System V streams, The protocol provides interoperability: programs using different APIs can exchange informationI dont need to know remote users APITCP/IP Protocol: Reliable byte streamsWinSock APIBerkeley Sockets APIApplicationApplication

  • An API can have Multiple ProtocolsMPI provides portability: any correct program compiles & runs on a platformDoes not provide interoperability: all processes must link against same SDKE.g., MPICH and LAM versions of MPI

  • APIs and Protocols are Both ImportantStandard APIs/SDKs are importantThey enable application portabilityBut w/o standard protocols, interoperability is hard (every SDK speaks every protocol?)Standard protocols are importantEnable cross-site interoperabilityEnable shared infrastructureBut w/o standard APIs/SDKs, application portability is hard (different platforms access protocols in different ways)

  • Why Discuss Architecture?DescriptiveProvide a common vocabulary for use when describing Grid systemsGuidanceIdentify key areas in which services are required PrescriptiveDefine standard Intergrid protocols and APIs to facilitate creation of interoperable Grid systems and portable applications

  • One View of RequirementsIdentity & authenticationAuthorization & policyResource discoveryResource characterizationResource allocation(Co-)reservation, workflowDistributed algorithmsRemote data accessHigh-speed data transferPerformance guaranteesMonitoringAdaptationIntrusion detectionResource managementAccounting & paymentFault managementSystem evolutionEtc.Etc.

  • Another View: Three Obstaclesto Making Grid Computing RoutineNew approaches to problem solvingData Grids, distributed computing, peer-to-peer, collaboration grids, Structuring and writing programsAbstractions, toolsEnabling resource sharing across distinct institutionsResource discovery, access, reservation, allocation; authentication, authorization, policy; communication; fault detection and notification;

  • Programming & Systems ProblemsThe programming problemFacilitate development of sophisticated appsFacilitate code sharingRequires prog. envs: APIs, SDKs, toolsThe systems problemFacilitate coordinated use of diverse resourcesFacilitate infrastructure sharing: e.g., certificate authorities, info servicesRequires systems: protocols, servicesE.g., port/service/protocol for accessing information, allocating resources

  • The Systems Problem:Resource Sharing Mechanisms That Address security and policy concerns of resource owners and usersAre flexible enough to deal with many resource types and sharing modalitiesScale to large number of resources, many participants, many program componentsOperate efficiently when dealing with large amounts of data & computation

  • Aspects of the Systems ProblemNeed for interoperability when different groups want to share resourcesDiverse components, policies, mechanismsE.g., standard notions of identity, means of communication, resource descriptionsNeed for shared infrastructure services to avoid repeated development, installationE.g., one port/service/protocol for remote access to computing, not one per tool/applnE.g., Certificate Authorities: expensive to runA common need for protocols & services

  • A Protocol-Oriented View of Grid Architecture That Emphasizes Development of Grid protocols & servicesProtocol-mediated access to remote resourcesNew services: e.g., resource brokeringOn the Grid = speak Intergrid protocolsMostly (extensions to) existing protocolsDevelopment of Grid APIs & SDKsInterfaces to Grid protocols & servicesFacilitate application development by supplying higher-level abstractionsThe (hugely successful) model is the Internet

  • Layered Grid Architecture(By Analogy to Internet Architecture)

  • Protocols, Services, and APIs Occur at Each LevelLanguages/FrameworksFabric LayerApplicationsLocal Access APIs and ProtocolsCollective Service APIs and SDKsCollective ServicesCollective Service ProtocolsResource APIs and SDKsResource ServicesResource Service ProtocolsConnectivity APIsConnectivity Protocols

  • Important PointsBuilt on Internet protocols & servicesCommunication, routing, name resolution, etc.Layering here is conceptual, does not imply constraints on who can call whatProtocols/services/APIs/SDKs will, ideally, be largely self-containedSome things are fundamental: e.g., communication and securityBut, advantageous for higher-level functions to use common lower-level functions

  • The Hourglass ModelFocus on architecture issuesPropose set of core services as basic infrastructureUse to construct high-level, domain-specific solutionsDesign principlesKeep participation cost lowEnable local controlSupport for adaptationIP hourglass modelDiverse global servicesCoreservicesLocal OSA p p l i c a t i o n s

  • Where Are We With Architecture?No official standards existBut: Globus Toolkit has emerged as the de facto standard for several important Connectivity, Resource, and Collective protocolsGGF has an architecture working groupTechnical specifications are being developed for architecture elements: e.g., security, data, resource management, informationInternet drafts submitted in security area

  • Fabric LayerProtocols & ServicesJust what you would expect: the diverse mix of resources that may be sharedIndividual computers, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc.Few constraints on low-level technology: connectivity and resource level protocols form the neck in the hourglass Defined by interfaces not physical characteristics

  • Connectivity LayerProtocols & ServicesCommunicationInternet protocols: IP, DNS, routing, etc.Security: Grid Security Infrastructure (GSI)Uniform authentication, authorization, and message protection mechanisms in multi-institutional settingSingle sign-on, delegation, identity mappingPublic key technology, SSL, X.509, GSS-APISupporting infrastructure: Certificate Authorities, certificate & key management, GSI:

  • Resource LayerProtocols & ServicesGrid Resource Allocation Mgmt (GRAM) Remote allocation, reservation, monitoring, control of compute resourcesGridFTP protocol (FTP extensions)High-performance data access & transportGrid Resource Information Service (GRIS)Access to structure & state informationNetwork reservation, monitoring, controlAll built on connectivity layer: GSI & IPGridFTP: www.gridforum.orgGRAM, GRIS:

  • Collective LayerProtocols & ServicesIndex servers aka metadirectory servicesCustom views on dynamic resource collections assembled by a community Resource brokers (e.g., Condor Matchmaker)Resource discovery and allocationReplica catalogsReplication servicesCo-reservation and co-allocation servicesWorkflow management servicesEtc.

  • Example:High-ThroughputComputing SystemHigh Throughput Computing SystemDynamic checkpoint, job management, failover, stagingBrokering, certificate authorities Access to data, access to computers, access to network performance data Communication, service discovery (DNS), authentication, authorization, delegationStorage systems, schedulersCollective(App)AppCollective(Generic)ResourceConnectFabric

  • Example:Data Grid ArchitectureDiscipline-Specific Data Grid ApplicationCoherency control, replica selection, task management, virtual data catalog, virtual data code catalog, Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs, Access to data, access to computers, access to network performance data, Communication, service discovery (DNS), authentication, authorization, delegationStorage systems, clusters, networks, network caches, Collective(App)AppCollective(Generic)ResourceConnectFabric

  • The Programming ProblemBut how do I develop robust, secure, long-lived, well-performing applications for dynamic, heterogeneous Grids?I need, presumably:Abstractions and models to add to speed/robustness/etc. of developmentTools to ease application development and diagnose common problemsCode/tool sharing to allow reuse of code components developed by others

  • Grid Programming TechnologiesGrid applications are incredibly diverse (data, collaboration, computing, sensors, )Seems unlikely there is one solutionMost applications have been written from scratch, with or without Grid servicesApplication-specific libraries have been shown to provide significant benefitsNo new language, programming model, etc., has yet emerged that transforms thingsBut certainly still quite possible

  • Examples of GridProgramming TechnologiesMPICH-G2: Grid-enabled message passingCoG Kits, GridPort: Portal construction, based on N-tier architecturesGDMP, Data Grid Tools, SRB: replica management, collection managementCondor-G: workflow managementLegion: object models for Grid computingCactus: Grid-aware numerical solver frameworkNote tremendous variety, application focus

  • MPICH-G2: A Grid-Enabled MPIA complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environmentsBased on the Argonne MPICH implementation of MPI (Gropp and Lusk)Requires services for authentication, resource allocation, executable staging, output, etc.Programs run in wide area without changeSee also: MetaMPI, PACX, STAMPI,

  • Cactus(Allen, Dramlitsch, Seidel, Shalf, Radke)Modular, portable framework for parallel, multidimensional simulationsConstruct codes by linkingSmall core (flesh): mgmt servicesSelected modules (thorns): Numerical methods, grids & domain decomps, visualization and steering, etc.Custom linking/configuration toolsDeveloped for astrophysics, but not astrophysics-specificCactus

  • High-Throughput Computingand CondorHigh-throughput computingCPU cycles/day (week, month, year?) under non-ideal circumstancesHow many times can I run simulation X in a month using all available machines?Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughput computing facilityEmphasis on policy management and reliability

  • Object-Based ApproachesGrid-enabled CORBANASA Lewis, Rutgers, ANL, othersCORBA wrappers for Grid protocolsSome initial successesLegionU.VirginiaObject models for Grid components (e.g., vault=storage, host=computer)

  • PortalsN-tier architectures enabling thin clients, with middle tiers using Grid functionsThin clients = Web browsersMiddle tier = e.g. Java Server Pages, with Java CoG Kit, GPDK, GridPort utilitiesBottom tier = various Grid resourcesNumerous applications and projects, e.g.Unicore, Gateway, Discover, Mississippi Computational Web Portal, NPACI Grid Port, Lattice Portal, Nimrod-G, Cactus, NASA IPG Launchpad, Grid Resource Broker,

  • Common Toolkit UnderneathEach of these programming environments should not have to implement the protocols and services from scratch!Rather, want to share common code thatImplements core functionalitySoftware Development Kits (SDKs) that can be used to construct a large variety of services and clientsStandard services that can be easily deployedIs robust, well-architected, self-consistentIs open source, with broad input

  • General ApproachDefine Grid protocols & APIsProtocol-mediated access to remote resourcesIntegrate and extend existing standardsOn the Grid = speak Intergrid protocolsDevelop a reference implementationClient and server SDKs, services, tools, etc.Grid-enable wide variety of toolsLearn through deployment and applications

  • Globus ToolkitA software toolkit addressing key technical problems in the development of Grid enabled tools, services, and applicationsOffer a modular bag of technologiesEnable incremental development of grid-enabled tools and applications Implement standard Grid protocols and APIsMake available under liberal open source licenseCurrent version is 4.0, commonly referred to as GT4

  • Key Concepts for GT4OGSA, WSRF, and GT4These are basic architecture components for GT4Open Grid Services Architecture (OGSA)Web Services: OGSA, WSRF, and GT4 are based on standard Web Services technologies such as SOAP and WSDL. Need to be familiar with the Web Services architecture and languages.The Web Services Resource Framework: WSRF is the core of GT4.

  • Key Concepts for GT4 (cont)The GT4 Architecture: Based on WS-Resources and Web Services, and grid computingJava & XML: to use GT4, you need to be able to program in Java, and to understand basic XML.

  • OGSA Key RequirementsInteroperability and Support for Dynamic and Heterogeneous EnvironmentsResource Sharing Across OrganizationsOptimizationQuality of Service (QoS) AssuranceJob ExecutionData ServicesSecurityAdministrative Cost ReductionScalabilityAvailabilityEase of Use and Extensibility

  • OGSA Defines Basic CapabilitiesInfrastructure ServicesExecution Management ServicesData ServicesResource Management ServicesSecurity ServicesSelf-Management ServicesInformation ServicesSecurity Considerations

  • OGSA, WSRF, and GT4

  • GT4 Roadmap

  • History and MotivationDo we want standard APIs?Eg. MPI (Message Passing Interface)But on the grid, we actually want standard wire protocolsThe API can be different on each system

  • History and Motivation (cont)Open Grid Services Infrastructure (OGSI)Global Grid Forum (GGF) standardIdentified a number of common building blocks used in grid protocolsInspecting state, creating and removing state, detecting changes in state, naming stateDefined standard ways to do these things, based on Web services (defined a thing called a Grid Service)

  • History and Motivation (cont)But thenRealized that this was useful for Web services in general, not just for the grid.Moved out of GGF, into OASISSplit the single OGSI specification into a number of other specifications called WSRF.

  • Globus ToolkitGrid infrastructure softwareFour key protocolsSecurity/Authentication (GSI)Resource Management/Scheduling (GRAM)Resource description (GRIS/GIIS)Data/File transfer (GASS, GridFTP)

  • Grid Security Infrastructure (GSI)

  • Security TerminologyAuthentication: Establishing identityAuthorization: Establishing rightsMessage protectionMessage integrityMessage confidentialityNon-repudiationDigital signatureAccountingCertificate Authority (CA)

  • Why Grid Security is HardResources being used may be valuable & the problems being solved sensitiveResources are often located in distinct administrative domainsEach resource has own policies & proceduresSet of resources used by a single computation may be large, dynamic, and unpredictableNot just client/server, requires delegationIt must be broadly available & applicableStandard, well-tested, well-understood protocols; integrated with wide variety of tools

  • GSI in ActionCreate Processes at A and B that Communicate & Access Files at CSite A(Kerberos) Site B (Unix)Site C(Kerberos)Computer



  • Grid Security Requirements

  • Candidate StandardsKerberos 5Fails to meet requirements:Integration with various local security solutionsUser based trust modelTransport Layer Security (TLS/SSL)Fails to meet requirements:Single sign-onDelegation

  • Grid Security Infrastructure (GSI)Extensions to standard protocols & APIsStandards: SSL/TLS, X.509 & CA, GSS-APIExtensions for single sign-on and delegationGlobus Toolkit reference implementation of GSISSLeay/OpenSSL + GSS-API + SSO/delegationTools and services to interface to local securitySimple ACLs; SSLK5/PKINIT for access to K5, AFS; Tools for credential managementLogin, logout, etc.SmartcardsMyProxy: Web portal login and delegationK5cert: Automatic X.509 certificate creation

  • Review of Public Key CryptographyAsymmetric keysA private key is used to encrypt data.A public key can decrypt data encrypted with the private key.An X.509 certificate includesSomeones subject name (user ID)Their public keyA signature from a Certificate Authority (CA) that:Proves that the certificate came from the CA.Vouches for the subject nameVouches for the binding of the public key to the subject

  • Public Key Based AuthenticationUser sends certificate over the wire.Other end sends user a challenge string.User encodes the challenge string with private keyPossession of private key means you can authenticate as subject in certificatePublic key is used to decode the challenge.If you can decode it, you know the subjectTreat your private key carefully!!Private key is stored only in well-guarded places, and only in encrypted form

  • X.509 Proxy CertificateDefines how a short term, restricted credential can be created from a normal, long-term X.509 credentialA proxy certificate is a special type of X.509 certificate that is signed by the normal end entity cert, or by another proxySupports single sign-on & delegation through impersonationCurrently an IETF draft

  • User ProxiesMinimize exposure of users private keyA temporary, X.509 proxy credential for use by our computationsWe call this a user proxy certificateAllows process to act on behalf of userUser-signed user proxy cert stored in local fileCreated via grid-proxy-init commandProxys private key is not encryptedRely on file system security, proxy certificate file must be readable only by the owner

  • DelegationRemote creation of a user proxyResults in a new private key and X.509 proxy certificate, signed by the original keyAllows remote process to act on behalf of the userAvoids sending passwords or private keys across the network

  • Globus Security APIsGeneric Security Service (GSS) APIIETF standardProvides functions for authentication, delegation, message protectionDecoupled from any particular communication methodBut GSS-API is somewhat complicated, so we also provide the easier-to-use globus_gss_assist API.GSI-enabled SASL is also provided

  • ResultsGSI adopted by 100s of sites, 1000s of usersGlobus CA has issued >3000 certs (user & host), >1500 currently active; other CAs activeRollouts are currently underway all over:NSF Teragrid, NASA Information Power Grid, DOE Science Grid, European Data Grid, etc.Integrated in research & commercial appsGrADS testbed, Earth Systems Grid, European Data Grid, GriPhyN, NEESgrid, etc.Standardization begun in Global Grid Forum, IETF

  • GSI ApplicationsGlobus Toolkit uses GSI for authenticationMany Grid tools, directly or indirectly, e.g.Condor-G, SRB, MPICH-G2, Cactus, GDMP, Commercial and open source tools, e.g.ssh, ftp, cvs, OpenLDAP, OpenAFSSecureCRT (Win32 ssh client)And since we use standard X.509 certificates, they can also be used forWeb access, LDAP server access, etc.

  • Ongoing and Future GSI WorkProtection against compromised resourcesRestricted delegation, smartcardsStandardizationScalability in numbers of users & resources Credential managementOnline credential repositories (MyProxy)Account managementAuthorizationPolicy languagesCommunity authorization

  • Proxy Certificate Standards WorkInternet Public Key Infrastructure X.509 Proxy Certificate Profiledraft-ietf-pkix-proxy-01.txtDraft being considered by IETF PKIX working group, and by GGF GSI working groupDefines proxy certificate format, including restricted rights and delegation tracingDemonstrated a prototype of restricted proxies at HPDC (August 2001) as part of CAS demo

  • GSS-API Extensions Work4 years of GSS-API experience, while on the whole quite positive, has shed light on various deficiencies of GSS-APIGSS-API Extensionsdraft-ggf-gss-extensions-04.txtDraft being considered by GGF GSI working group. Not yet submitted to IETF.Defines extensions to the GSS-API to better support Grid security

  • GSS-API ExtensionsCredential export/importAllows delegated credentials to be externalizedUsed for checkpointing a serviceDelegation at any time, in either directionMore rich options on use of delegationRestricted delegation handlingAdd proxy restrictions to delegated credInspect auth cert for restrictionsAllow better mapping of GSS to TLSSupport TLS framing of messages

  • Community Authorization ServiceQuestion: How does a large community grant its users access to a large set of resources?Should minimize burden on both the users and resource providersCommunity Authorization Service (CAS)Community negotiates access to resourcesResource outsources fine-grain authorization to CASResource only knows about CAS user credentialCAS handles user registration, group membershipUser who wants access to resource asks CAS for a capability credentialRestricted proxy of the CAS user cred., checked by resource

  • Community Authorization(Prototype shown August 2001) User

  • Community Authorization ServiceCAS provides user community with information needed to authenticate resourcesSent with capability credential, used on connection with resourceResource identity (DN), CAThis allows new resources/users (and their CAs) to be made available to a community through the CAS without action on the other users/resources part

  • Authorization APIService providers need to perform authorization policy evaluation on:Local policiesPolicies contained in restricted proxiesWe are working on 2 API layers:Low level GAA-API implementation for evaluation of policiesHigh level, very simple authorization API that can easily be embedded into servicesStill in early prototyping stage

  • Passport Online CA & MyProxyRequiring users to manage their own certs and keys is annoying and error proneA solution: Leverage Passport global authentication to obtain a proxy credentialPassport providesGlobally unique user name (email address)Method of verifying ownership of the name (authentication)Re-issuance (e.g. forgotten password)Passport credentials can be presented to an online CA or credential repositoryCreates and issues new (restricted) proxy certificate to the user on demand

  • Other Future Security WorkEase-of-useImproved error message, online CA, etc.Improved online credential repositoriesSee MyProxy paper at HPDCSupport for multiple user credentialsMulti-factor authenticationSubordinate certificate authorities for domainsEase issuance of host certs for domainsIndependent Data Unit Support

  • Security SummaryGSI successfully addresses wide variety of Grid security issuesBroad acceptance, deployment, integration with toolsStandardization on-going in IETF & GGFOngoing R&D to address next set of issuesFor more information: Security Architecture for Computational GridsDesign and Deployment of a National-Scale Authentication

  • Grid Resource Allocation Management (GRAM)

  • The ChallengeEnabling secure, controlled remote access to heterogeneous computational resources and management of remote computationAuthentication and authorizationResource discovery & characterizationReservation and allocationComputation monitoring and controlAddressed by new protocols & servicesGRAM protocol as a basic building blockResource brokering & co-allocation servicesGSI for security, MDS for discovery

  • Resource ManagementThe Grid Resource Allocation Management (GRAM) protocol and client API allows programs to be started on remote resources, despite local heterogeneityResource Specification Language (RSL) is used to communicate requirements A layered architecture allows application-specific resource brokers and co-allocators to be defined in terms of GRAM servicesIntegrated with Condor, PBS, MPICH-G2,

  • Resource Management ArchitectureGRAMGRAMGRAMLSFCondorNQEApplicationRSLSimple ground RSLInformation ServiceLocalresourcemanagersRSLspecializationGround RSLQueries& Info

  • Resource Specification LanguageCommon notation for exchange of information between componentsSyntax similar to MDS/LDAP filtersRSL provides two types of information:Resource requirements: Machine type, number of nodes, memory, etc.Job configuration: Directory, executable, args, environmentGlobus Toolkit provides an API/SDK for manipulating RSL

  • RSL SyntaxElementary form: parenthesis clauses(attribute op value [ value ] )Operators Supported: , != Some supported attributes:executable, arguments, environment, stdin, stdout, stderr, resourceManagerContact, resourceManagerNameUnknown attributes are passed through May be handled by subsequent tools

  • Constraints: &For example:& (count>=5) (count=64) (executable=myprog)Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours

  • Disjunction: |For example:& (executable=myprog) ( | (&(count=5)(memory>=64)) (&(count=10)(memory>=32)))Create 5 instances of myprog on a machine that has at least 64MB of memory, or 10 instances on a machine with at least 32MB of memory

  • GRAM Protocol EvolutionGRAM-1: Simple HTTP-based RPCJob requestReturns a job contact: Opaque string that can be passed between clients, for access to jobJob cancel, status, signalEvent notification (callbacks) for state changesPending, active, done, failed, suspendedGRAM-1.5 (U Wisconsin contribution)Add reliability improvementsOnce-and-only-once submissionRecoverable job manager serviceReliable termination detectionGRAM-2: Moving to Web Services (SOAP)

  • Globus Toolkit ImplementationGatekeeperSingle point of entryAuthenticates user, maps to local security environment, runs serviceIn essence, a secure inetd Job managerA gatekeeper serviceLayers on top of local resource management system (e.g., PBS, LSF, etc.)Handles remote interaction with the job

  • GRAM ComponentsGrid SecurityInfrastructureJob ManagerGRAM client API calls to request resource allocationand process creation.MDS client API callsto locate resourcesQuery current statusof resourceCreateRSL LibraryParseRequestAllocate &create processesProcessProcessProcessMonitor &controlSite boundaryClientMDS: Grid Index Info ServerGatekeeperMDS: Grid Resource Info ServerLocal Resource ManagerMDS client API callsto get resource infoGRAM client API statechange callbacks

  • Co-allocationSimultaneous allocation of a resource setHandled via optimistic co-allocation based on free nodes or queue predictionIn the future, advance reservations will also be supported (already in prototype)Globus APIs/SDKs support the co-allocation of specific multi-requestsUses a Globus component called the Dynamically Updated Request Online Co-allocator (DUROC)

  • Multirequest: +A multirequest allows us to specify multiple resource needs, for example+ (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2))Execute 5 instances of p1 on a machine with at least 64M of memoryExecute p2 on a machine with an ATM connectionMultirequests are central to co-allocation

  • A Co-allocation Multirequest+( & (resourceManagerContact= (count=1) (label="subjob A") (executable= my_app1) ) ( & (resourceManagerContact=") (count=2) (label="subjob B") (executable=my_app2) )

  • Job Submission InterfacesGlobus Toolkit includes several command line programs for job submission globus-job-run: Interactive jobsglobus-job-submit: Batch/offline jobsglobusrun: Flexible scripting infrastructureOthers are building better interfacesGeneral purposeCondor-G, PBS, GRD, Hotpage, etcApplication specificECCE, Cactus, Web portals

  • globus-job-runFor running of interactive jobsAdditional functionality beyond rshEx: Run 2 process job w/ executable stagingglobus-job-run -: host np 2 s myprog arg1 arg2Ex: Run 5 processes across 2 hostsglobus-job-run \-: host1 np 2 s myprog.linux arg1 \-: host2 np 3 s myprog.aix arg2For list of arguments run:globus-job-run -help

  • globus-job-submitFor running of batch/offline jobsglobus-job-submitSubmit jobSame interface as globus-job-runReturns immediatelyglobus-job-statusCheck job statusglobus-job-cancelCancel jobglobus-job-get-outputGet job stdout/errglobus-job-cleanCleanup after job

  • globusrunFlexible job submission for scriptingUses an RSL string to specify job request Contains an embedded globus-gass-serverDefines GASS URL prefix in RSL substitution variable:(stdout=$(GLOBUSRUN_GASS_URL)/stdout)Supports both interactive and offline jobsComplex to useMust write RSL by handMust understand its esoteric featuresGenerally you should use globus-job-* commands instead

  • Resource Management APIsThe globus_gram_client API provides access to all of the core job submission and management capabilities, including callback capabilities for monitoring job status.The globus_rsl API provides convenience functions for manipulating and constructing RSL strings.The globus_gram_myjob allows multi-process jobs to self-organize and to communicate with each other.The globus_duroc_control and globus_duroc_runtime APIs provide access to multirequest (co-allocation) capabilities.

  • Advance Reservationand Other GeneralizationsGeneral-purpose Architecture for Reservation and Allocation (GARA)2nd generation resource management servicesBroadens GRAM on two axesGeneralize to support various resource typesCPU, storage, network, devices, etc.Advance reservation of resources, in addition to allocationCurrently a research prototype

  • GARA: The Big Picture

  • Grid Information ServicesSystem information is critical to operation of the grid and construction of applicationsWhat resources are available?Resource discoveryWhat is the state of the grid?Resource selectionHow to optimize resource use Application configuration and adaptation?We need a general information infrastructure to answer these questions

  • Examples of Useful InformationCharacteristics of a compute resourceIP address, software available, system administrator, networks connected to, OS version, loadCharacteristics of a networkBandwidth and latency, protocols, logical topologyCharacteristics of the Globus infrastructureHosts, resource managers

  • Grid Information: Facts of LifeInformation is always oldTime of flight, changing system stateNeed to provide quality metricsDistributed state hard to obtainComplexity of global snapshot Component will failScalability and overheadMany different usage scenariosHeterogeneous policy, different information organizations, etc.

  • Grid Information ServiceProvide access to static and dynamic information regarding system componentsA basis for configuration and adaptation in heterogeneous, dynamic environmentsRequirements and characteristicsUniform, flexible access to informationScalable, efficient access to dynamic dataAccess to multiple information sourcesDecentralized maintenance

  • Two Classes Of Information ServersResource Description ServicesSupplies information about a specific resource (e.g. Globus 1.1.3 GRIS).Aggregate Directory ServicesSupplies collection of information which was gathered from multiple GRIS servers (e.g. Globus 1.1.3 GIIS).Customized naming and indexing

  • Information ProtocolsGrid Resource Registration ProtocolSupport information/resource discoveryDesigned to support machine/network failureGrid Resource Inquiry ProtocolQuery resource description server for informationQuery aggregate server for informationLDAP V3.0 in Globus 1.1.3

  • GIS ArchitectureAACustomized Aggregate Directories RRRRStandard Resource Description ServicesRegistrationProtocolUsersEnquiryProtocol

  • Metacomputing Directory ServiceUse LDAP as Inquiry Access information in a distributed directoryDirectory represented by collection of LDAP serversEach server optimized for particular functionDirectory can be updated by: Information providers and toolsApplications (i.e., users)Backend tools which generate info on demandInformation dynamically available to tools and applications

  • Two Classes Of MDS ServersGrid Resource Information Service (GRIS)Supplies information about a specific resourceConfigurable to support multiple information providersLDAP as inquiry protocolGrid Index Information Service (GIIS)Supplies collection of information which was gathered from multiple GRIS serversSupports efficient queries against information which is spread across multiple GRIS serverLDAP as inquiry protocol

  • LDAP DetailsLightweight Directory Access ProtocolIETF StandardStripped down version of X.500 DAP protocolSupports distributed storage/access (referrals)Supports authentication and access controlDefines:Network protocol for accessing directory contentsInformation model defining form of information Namespace defining how information is referenced and organized

  • MDS ComponentsLDAP 3.0 Protocol EngineBased on OpenLDAP with custom backendIntegrated cachingInformation providersDelivers resource information to backendAPIs for accessing & updating MDS contentsC, Java, PERL (LDAP API, JNDI)Various tools for manipulating MDS contentsCommand line tools, Shell scripts & GUIs

  • Grid Resource Information ServiceServer which runs on each resourceGiven the resource DNS name, you can find the GRIS server (well known port = 2135)Provides resource specific informationMuch of this information may be dynamicLoad, process information, storage information, etc.GRIS gathers this information on demandWhite pages lookup of resource informationEx: How much memory does machine have?Yellow pages lookup of resource optionsEx: Which queues on machine allows large jobs?

  • Grid Index Information ServiceGIIS describes a class of serversGathers information from multiple GRIS serversEach GIIS is optimized for particular queriesEx1: Which Alliance machines are >16 process SGIs?Ex2: Which Alliance storage servers have >100Mbps bandwidth to host X?Akin to web search enginesOrganization GIISThe Globus Toolkit ships with one GIISCaches GRIS info with long update frequencyUseful for queries across an organization that rely on relatively static information (Ex1 above)Can be merged into GRIS

  • Information Services APIRFC 1823 defines an IETF draft standard client API for accessing LDAP databasesConnect to serverPose query which returns data structures contains sets of object classes and attributesFunctions to walk these data structuresGlobus does not provide an LDAP API. We recommend the use of OpenLDAP, an open source implementation of RFC 1823.

  • Searching an LDAP Directorygrid-info-search [options] filter [attributes]

    DataGrid is a project funded by the European Union

    Exactly like grid-info-search, except defaults:-h localhostGRIS server-p 2135 GRIS port

    DataGrid is a project funded by the European Union

  • Data Grid ProblemEnable a geographically distributed community [of thousands] to pool their resources in order to perform sophisticated, computationally intensive analyses on Petabytes of dataNote that this problem:Is common to many areas of scienceOverlaps strongly with other Grid problems

  • Data Intensive Issues Include Harness [potentially large numbers of] data, storage, network resources located in distinct administrative domains Respect local and global policies governing what can be used for whatSchedule resources efficiently, again subject to local and global constraintsAchieve high performance, with respect to both speed and reliabilityCatalog software and virtual data

  • Data IntensiveComputing and GridsThe term Data Grid is often usedUnfortunate as it implies a distinct infrastructure, which it isnt; but easy to say Data-intensive computing shares numerous requirements with collaboration, instrumentation, computation, Security, resource mgt, info services, etc.Important to exploit commonalities as very unlikely that multiple infrastructures can be maintainedFortunately this seems easy to do!

  • Examples ofDesired Data Grid FunctionalityHigh-speed, reliable access to remote dataAutomated discovery of best copy of data Manage replication to improve performanceCo-schedule compute, storage, networkTransparency wrt delivered performanceEnforce access control on dataAllow representation of global resource allocation policies

  • A Model Architecture for Data GridsMetadata CatalogReplica CatalogTape LibraryDisk CacheAttribute SpecificationLogical Collection and Logical File NameDisk ArrayDisk CacheApplicationReplica SelectionMultiple LocationsNWSSelectedReplicaGridFTP Control ChannelPerformanceInformation &PredictionsReplica Location 1Replica Location 2Replica Location 3MDSGridFTP Data Channel

  • Globus Toolkit ComponentsTwo major Data Grid components:

    2. Replica Management ArchitectureSimple scheme for managing:multiple copies of filescollections of files

  • Motivation for a Common Data Access ProtocolExisting distributed data storage systemsDPSS, HPSS: focus on high-performance access, utilize parallel data transfer, stripingDFS: focus on high-volume usage, dataset replication, local cachingSRB: connects heterogeneous data collections, uniform client interface, metadata queriesProblemsIncompatible (and proprietary) protocolsEach require custom clientPartitions available data sets and storage devicesEach protocol has subset of desired functionality

  • A Common, Secure,Efficient Data Access ProtocolCommon, extensible transfer protocolCommon protocol means all can interoperateDecouple low-level data transfer mechanisms from the storage serviceAdvantages: New, specialized storage systems are automatically compatible with existing systemsExisting systems have richer data transfer functionalityInterface to many storage systemsHPSS, DPSS, file systemsPlan for SRB integration

  • Access/Transport Protocol RequirementsSuite of communication libraries and related tools that supportGSI, Kerberos securityThird-party transfersParameter set/negotiatePartial file accessReliability/restartLarge file supportData channel reuseAll based on a standard, widely deployed protocolIntegrated instrumentationLoggin/audit trailParallel transfersStriping (cf DPSS)Policy-based access controlServer-side computationProxies (firewall, load bal)

  • GridFTP and Grid Access to Secondary Storage (GASS)

  • GridFTPWhy FTP?Ubiquity enables interoperation with many commodity toolsAlready supports many desired features, easily extended to support othersWell understood and supportedWe use the term GridFTP to refer toTransfer protocol which meets requirementsFamily of tools which implement the protocolNote GridFTP > FTPNote that despite name, GridFTP is not restricted to file transfer!

  • GridFTP: Basic ApproachFTP protocol is defined by several IETF RFCsStart with most commonly used subsetStandard FTP: get/put etc., 3rd-party transferImplement standard but often unused featuresGSS binding, extended directory listing, simple restartExtend in various ways, while preserving interoperability with existing serversStriped/parallel data channels, partial file, automatic & manual TCP buffer setting, progress monitoring, extended restart

  • GridFTP Protocol SpecificationsExisting standardsRFC 949: File Transfer ProtocolRFC 2228: FTP Security ExtensionsRFC 2389: Feature Negotiation for the File Transfer ProtocolDraft: FTP ExtensionsNew draftsGridFTP: Protocol Extensions to FTP for the GridGrid Forum Data Working Group

  • GridFTP vs. WebDAVWebDAV extends http for remote data accessCombines control and data over single channelFTP splits control and dataSupports multiple, user selectable data channel protocolsAdvantage to split channelsThird party transfers handled cleanlyCan (cleanly) define new data channel protocolsE.g. parallel/striped transfer, automatic TCP buffer/window negotiation, non-TCP based protocols, etc.Amenable to high-performance proxiesE.g. For firewalls, load balancing, etc.

  • The GridFTP Family of ToolsPatches to existing FTP codeGSI-enabled versions of existing FTP client and server, for high-quality production codeCustom-developed librariesImplement full GridFTP protocol, targeting custom use, high-performanceCustom-developed toolsServers and clients with specialized functionality and performance

  • A Word on GASSThe Globus Toolkit provides services for file and executable staging and I/O redirection that work well with GRAM. This is known as Globus Access to Secondary Storage (GASS).GASS uses GSI-enabled HTTP as the protocol for data transfer, and a caching algorithm for copying data when necessary.The globus_gass, globus_gass_transfer, and globus_gass_cache APIs provide programmer access to these capabilities, which are already integrated with the GR