Globus – Part IIGlobus – Part II
Sathish VadhiyarSathish Vadhiyar
Globus Information Globus Information ServiceService
MDSMDS
Meta directory service, Monitoring and discovery Meta directory service, Monitoring and discovery serviceserviceFor publishing and accessing system and For publishing and accessing system and application dataapplication dataCan restrict access to MDS information by using Can restrict access to MDS information by using GSIGSIInteracts with local information services – hour-Interacts with local information services – hour-glass mechanismglass mechanismProvides caching to minimize transfer of upto-Provides caching to minimize transfer of upto-date information and lessen network overheaddate information and lessen network overhead
MDSMDS
Integrates existing systems while providing uniform and Integrates existing systems while providing uniform and extensible data modelextensible data modelUniform APIUniform APIAdopts data representation and API, query language and Adopts data representation and API, query language and protocol from LDAP directory serviceprotocol from LDAP directory serviceUses 2 protocolsUses 2 protocols
GRIP – for providing information about entitiesGRIP – for providing information about entities GRRP – for registering entitiesGRRP – for registering entities
LDAP query language supports:LDAP query language supports: SearchSearch EnquiryEnquiry subscriptionsubscription
MDS ArchitectureMDS Architecture
GIIS – Grid Index Information Service
GRIS – Grid Resource Information Service
MDSMDS
Support for multiple information service providers - Support for multiple information service providers - information providers specified on a per attribute basisinformation providers specified on a per attribute basisMDS Data:MDS Data:
System information: architecture, OSSystem information: architecture, OS Network informationNetwork information Load statusLoad status
Additional information sent to GIIS by GRAM reporterAdditional information sent to GIIS by GRAM reporter Job statusJob status Queue informationQueue information
Information viewed through web browser or web client Information viewed through web browser or web client commandscommands
MDSMDS
Contains entries where each entry is Contains entries where each entry is associated with one or more associated with one or more attribute:value pairsattribute:value pairs
Each entry associated with a distinguished Each entry associated with a distinguished name.name.
Object class are associated with entries – Object class are associated with entries – for object typesfor object types
Distinguished name exampleDistinguished name example
Another ExampleAnother Example
Distinguished names for NetworksDistinguished names for Networks
Globus Data GridGlobus Data Grid
Data GridData Grid
Challenges:Challenges: Petabytes and terabytes of dataPetabytes and terabytes of data Query management to this huge dataQuery management to this huge data Cache managementCache management Providing gigabit/sec QoSProviding gigabit/sec QoS Coscheduling data transfers and computationCoscheduling data transfers and computation Selection of dataset replicasSelection of dataset replicas Maximize use of scarce storage, computation Maximize use of scarce storage, computation
and network resourcesand network resources
Data Grid MotivationData Grid Motivation
Application requirements:
1. A reliable secure high-performance data transfer protocol
2. Management of multiple copies of files and collections of files
Data Grid ArchitectureData Grid Architecture
GridFTPGridFTP
Secure file transfer over GridSecure file transfer over GridMultiple data channels for parallel transfers – using Multiple data channels for parallel transfers – using multiple TCP streams in parallel to improve aggregate multiple TCP streams in parallel to improve aggregate bandwidthbandwidthPartial file transfersPartial file transfersThird-party (direct server-to-server) transfers by adding Third-party (direct server-to-server) transfers by adding GSSAPI security to the existing third-party data transfers GSSAPI security to the existing third-party data transfers in FTP standard – transfers between 2 servers mediated in FTP standard – transfers between 2 servers mediated by a third-party clientby a third-party clientGSSAPI operations authenticate the third party to the GSSAPI operations authenticate the third party to the source and destination machines of data transfersource and destination machines of data transfer
Grid FTP contd…Grid FTP contd…
Authenticated data channels - both GSI and Authenticated data channels - both GSI and Kerberos securityKerberos securityReusable data channels Reusable data channels Striped data transfersStriped data transfers2 libraries:2 libraries: globus_ftp_control_library – implements control globus_ftp_control_library – implements control
channel APIchannel API gobus_ftp_client_librray – implement GridFTP APIgobus_ftp_client_librray – implement GridFTP API
Plugin mechanisms for fault tolerance, Plugin mechanisms for fault tolerance, performance monitoring, and extended data performance monitoring, and extended data processingprocessing
Globus Replica Management Globus Replica Management ArchitectureArchitecture
Replica managementReplica management For better performance or availability to accessesFor better performance or availability to accesses Mainly for access to “published” resources – read-only modelMainly for access to “published” resources – read-only model
Functions:Functions:
Architecture:Architecture: Lower level replica catalog APILower level replica catalog API Higher level replica management APIHigher level replica management API
Replica catalogReplica catalog
Provides mapping between logical names of Provides mapping between logical names of files/locations and physical objects on storage systemsfiles/locations and physical objects on storage systemsStores 3 kinds of entriesStores 3 kinds of entries
Logical collection – user defined collections of files – file Logical collection – user defined collections of files – file aggregationaggregation
Location entries – physical locations of filesLocation entries – physical locations of files Logical files – globally unique namesLogical files – globally unique names
Replica catalog API provides operations on the replica Replica catalog API provides operations on the replica catalogcatalogReplica management API provides session Replica management API provides session management, catalog creation, file maintenance, access management, catalog creation, file maintenance, access controlcontrolImplemented with LDAPImplemented with LDAP
Replica managementReplica management
Globus Replica Management integrates the Globus Globus Replica Management integrates the Globus Replica Catalog (for keeping track of replicated files) and Replica Catalog (for keeping track of replicated files) and GridFTP (for moving data) and provides replica GridFTP (for moving data) and provides replica management capabilities for data grids.management capabilities for data grids.The globus_replica_management library provides client The globus_replica_management library provides client functions that allow files to be registered with the replica functions that allow files to be registered with the replica management service, published to replica locations, and management service, published to replica locations, and moved among multiple locations. moved among multiple locations. Managing the copying and placement of files in a Managing the copying and placement of files in a distributed computing system so as to improve the distributed computing system so as to improve the performance of data analysisperformance of data analysis
Replica management service - Replica management service - functionsfunctions
Registration of files with the replica Registration of files with the replica management servicemanagement service
Creation and deletion of replicas of Creation and deletion of replicas of previously registered filespreviously registered files
Enquiries concerning the location and Enquiries concerning the location and performance characteristics of replicas.performance characteristics of replicas.
Replica selection based on performance Replica selection based on performance characteristicscharacteristics
Replica managementReplica management
Replica management API – combines Replica management API – combines storage system operations with calls to storage system operations with calls to low-level catalog API functionslow-level catalog API functions
Replica management system controls Replica management system controls where and when copies are created and where and when copies are created and provides information about copiesprovides information about copies
But does not ensure file consistencyBut does not ensure file consistency
RM APIRM API
Session managementSession management Session handles and attributesSession handles and attributes RestartRestart RollbackRollback
Catalog creation and file managementCatalog creation and file management Creating catalog entriesCreating catalog entries registering filesregistering files Publishing filesPublishing files Copying, deleting filesCopying, deleting files
Future ideasFuture ideas Incorporating advance researvationIncorporating advance researvation Automatic replica selection and creationAutomatic replica selection and creation
Data grid projectsData grid projects http://www.globus.org/datagrid/projects.htmlhttp://www.globus.org/datagrid/projects.html
Replica Catalog IllustrationReplica Catalog Illustration
Replica Selection in Globus Data Replica Selection in Globus Data Grid (Vazhkudai et al.)Grid (Vazhkudai et al.)
Replica selection uses MDS for information regarding Replica selection uses MDS for information regarding characteristics of storage systemscharacteristics of storage systemsLDAP information organized as DIT (Directory Information Tree)LDAP information organized as DIT (Directory Information Tree)Each storage resource in Data Grid incorporates GRISEach storage resource in Data Grid incorporates GRISLDAP can execute shell scripts in the background to obtain various LDAP can execute shell scripts in the background to obtain various dynamic entities like availableSpace, mountPoint etc.dynamic entities like availableSpace, mountPoint etc.Static attributes like seek times can be entered by the system Static attributes like seek times can be entered by the system administratoradministratorAttributes like data transfer rates across networks to clients can be Attributes like data transfer rates across networks to clients can be obtained based on past performance, i.e., historical dataobtained based on past performance, i.e., historical dataClassAds can also be used for expressing storage attributesClassAds can also be used for expressing storage attributes
Directory for Storage GRISDirectory for Storage GRIS
Metadata SpecificationMetadata Specification
Performance Data SpecificationPerformance Data Specification
Steps in Replica ManagementSteps in Replica Management
1.1. Application queries metadata expressing Application queries metadata expressing desired characteristics of logical filesdesired characteristics of logical files
2.2. A logical file is returnedA logical file is returned
3.3. Application queries replica catalog for Application queries replica catalog for replica instances for the logical filereplica instances for the logical file
4.4. Storage broker helps to choose a Storage broker helps to choose a particular replicaparticular replica
Replica SelectionReplica Selection
Storage Architecture stepsStorage Architecture steps
1.1. Application presents classAds regarding Application presents classAds regarding replica requirements to SBreplica requirements to SB
2.2. SB does search:SB does search:1.1. Queries replica catalogs with the list of all replicasQueries replica catalogs with the list of all replicas2.2. Queries individual GRIS of replicas about their Queries individual GRIS of replicas about their
characteristicscharacteristics3.3. Collects all information and proceeds to matchingCollects all information and proceeds to matching
3.3. Match:Match:1.1. Converts replica capabilities to replica classAdsConverts replica capabilities to replica classAds2.2. Matches application classAds to replica classAdsMatches application classAds to replica classAds
4.4. Accesses file using GridFTPAccesses file using GridFTP
Globus References / sources / Globus References / sources / creditscredits
Grid Information Services for Distributed Resource SharingGrid Information Services for Distributed Resource Sharing. K. . K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. Proceedings of Proceedings of the Tenth IEEE International Symposium on High-Performance the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10)Distributed Computing (HPDC-10), IEEE Press, August 2001., IEEE Press, August 2001.Usage of LDAP in GlobusUsage of LDAP in Globus. I. Foster, G. von Laszewski.. I. Foster, G. von Laszewski.This short note describes the use of LDAP in the Globus toolkit. It This short note describes the use of LDAP in the Globus toolkit. It answers three questions: What is LDAP? Where is it used? and answers three questions: What is LDAP? Where is it used? and Why is it used in Globus?Why is it used in Globus?A Directory Service for Configuring High-Performance A Directory Service for Configuring High-Performance Distributed ComputationsDistributed Computations. S. Fitzgerald, I. Foster, C. Kesselman, . S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke. G. von Laszewski, W. Smith, S. Tuecke. Proc. 6th IEEE Symposium Proc. 6th IEEE Symposium on High-Performance Distributed Computingon High-Performance Distributed Computing, pp. 365-375, 1997., pp. 365-375, 1997.Describes the Metacomputing Directory Service used to maintain Describes the Metacomputing Directory Service used to maintain information about Globus components. information about Globus components.
Globus References / sources / Globus References / sources / creditscredits
The Data Grid: Towards an Architecture for the Distributed The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific DatasetsManagement and Analysis of Large Scientific Datasets. A. Chervenak, . A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke. I. Foster, C. Kesselman, C. Salisbury, S. Tuecke. Journal of Network and Journal of Network and Computer ApplicationsComputer Applications, 23:187-200, 2001 (based on conference publication , 23:187-200, 2001 (based on conference publication from Proceedings of NetStore Conference 1999).from Proceedings of NetStore Conference 1999).Secure, Efficient Data Transport and Replica Management for High-Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive ComputingPerformance Data-Intensive Computing. B. Allcock, J. Bester, J. . B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke. D. Quesnel, S. Tuecke. IEEE Mass Storage ConferenceIEEE Mass Storage Conference, 2001., 2001.Presents the design and performance characteristics of two fundamental Presents the design and performance characteristics of two fundamental technologies for data management.technologies for data management.Replica Selection in the Globus Data GridReplica Selection in the Globus Data Grid. S. Vazhkudai, S. Tuecke, I. . S. Vazhkudai, S. Tuecke, I. Foster. Foster. Proceedings of the First IEEE/ACM International Conference on Proceedings of the First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001)Cluster Computing and the Grid (CCGRID 2001), pp. 106-113, IEEE , pp. 106-113, IEEE Computer Society Press, May 2001.Computer Society Press, May 2001.Discusses a high-level replica selection service that uses information Discusses a high-level replica selection service that uses information regarding replica location and user preferences to guide selection from regarding replica location and user preferences to guide selection from among storage replica alternatives.among storage replica alternatives.
JUNK !!JUNK !!
RFT (Reliable File Transfer)RFT (Reliable File Transfer)
Treat movement of multiple files as a single jobTreat movement of multiple files as a single job
Accept transfer requests and reliably manage Accept transfer requests and reliably manage requestsrequests
OGSI compliantOGSI compliant
To transfer data reliably between two GridFTP To transfer data reliably between two GridFTP serversservers
Uses Grid Service Handles (GSH)Uses Grid Service Handles (GSH)
Acts as a proxy for the user, acts as client on Acts as a proxy for the user, acts as client on user’s behalf for third-party transfersuser’s behalf for third-party transfers
RFTRFT
Client submits SOAP description of data Client submits SOAP description of data transfer jobtransfer job
Maintains checkpoints in data basesMaintains checkpoints in data bases
Supports both “push” and “pull” Supports both “push” and “pull” mechanismsmechanisms
Data Grid Replica ServicesData Grid Replica Services
Need for meta-data servicesNeed for meta-data services
Various kinds:Various kinds: Application metadataApplication metadata Replica metadataReplica metadata System configuration metadataSystem configuration metadata
Replica managementReplica management For better performance or availability to accessesFor better performance or availability to accesses Mainly for access to “published” resources – read-Mainly for access to “published” resources – read-
only modelonly model
Replica CatalogReplica Catalog
Provide mappings between logical names for file or collections and Provide mappings between logical names for file or collections and one or more copies of those objects on physical systemsone or more copies of those objects on physical systemsServices provided by replica catalog:Services provided by replica catalog:
Registering a list of files as a logical collectionRegistering a list of files as a logical collection Registering the physical location of a complete or partial replica of a Registering the physical location of a complete or partial replica of a
logical collectionlogical collection Registering information about a particular logical file in a logical Registering information about a particular logical file in a logical
collectioncollection Modifying the contents of registered entities of the catalogModifying the contents of registered entities of the catalog Responding to queries of the catalogResponding to queries of the catalog
The Globus Replica Catalog supports replica management by The Globus Replica Catalog supports replica management by providing mappings between logical names for files and one or more providing mappings between logical names for files and one or more copies of the files on physical storage systems copies of the files on physical storage systems