1
Overview of gLite Middleware
Esther Montes PradoCIEMAT10th EELA TutorialMadrid, 7.5.2007
2EGEE / EELA / EuMedGrid Joint Grid Tutorial
New!
Outline
1. Introduction2. Overview of gLite services3. Summary and conclusions
4. = New in gLite 3.0
3
Introduction: the Grid goals
• The Grid connects Instruments, Computer Centres, Scientists
• If the Web is able to share information, the Grid is intended to share computing power and storage
4
UI UI
CE
RB/BDII
SE
WN
WN
WN
WN
WN
WN
LFC
Connections to UI
Resources Searching
Sent to th
e batc
h
system Distribution to CPUs
Ouputs copied to
Storage Resources
Catalogs getting track of the inputs
The Actors and their interconnections
5
Introduction: Grid Middleware
• Grid Middleware: layer between user applications and grid resources
• Grid Middlware should: Find convenient places for applications to be
run Optimise use of resources Organise efficient access to data Deal with authentication to the different
sites that are used Run the job & monitor progress Recover from problems Transfer the result back to the scientists
6
Introduction: Virtual Organisations
• The users of a Grid infraestructure are divided into Virtual Organisations (VOs).
• VOs are abstracts entities grouping users, institutions and resources in the same administrative domain.
• The EGEE VOs correspond to real organisations or projects: LHC experiments, the community of biomedical researchers, etc.
7
6 High Level Services+ CLI & API
Legend:
•Available•Foreseen in the architecture (only Job Provenance will be available by the end of EGEE-II)
gLite Services Decomposition
8
• Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware
• Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory
• Foundation Grid Middleware will be deployed on the EGEE infrastructure Must be complete and
robust Should allow
interoperation with other major grid infrastructures
Should not assume the use of Higher-Level Grid Services
New!
Middleware structure
Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
9
• Authentication based on X.509 PKI infrastructure Certificate Authorities (CA) issue (long lived) certificates
identifying individuals (much like a passport) Commonly used in web browsers to authenticate to sites
Trust between CAs and sites is established (offline) In order to reduce vulnerability, on the Grid user
identification is done by using (short lived) proxies of their certificates
• Proxies can Be delegated to a service such that it can act on the user’s
behalf Include additional attributes (like VO information via the
VO Membership Service - VOMS) Be stored in an external proxy store (MyProxy) Be renewed (in case they are about to expire)
Grid Foundation: Security
10
• Authentication User receives certificate
signed by CA Connects to “UI” by ssh Downloads certificate Single logon to Grid –
create proxy - then Grid Security Infrastructure identifies user to other machines
• Authorisation User joins Virtual
Organisation VO negotiates access to
Grid nodes and resources Authorisation tested by
CE grid-mapfile maps user to
local account
UI
AUP
VO mgr
Personal/once
VO database
grid-mapfileson Grid services
GSI
VO service
Daily update
CA1.
3.
2.
AuthN and AuthZ: pre-VOMS
11
• Before VOMS
• User is authorised as a member of a single VO
• All VO members have same rights
• Gridmapfiles are updated by VO management software: map the user’s DN to a local account
• grid-proxy-init – derives proxy from certificate – the “single sign-on to the grid”
• VOMS
• User can be in multiple VOs Aggregate rights
• VO can have groups Different rights for each
Different groups of experimentalists
… Nested groups
• VO has roles Assigned to specific
purposes E,g. system admin When assume this role
• Proxy certificate carries the additional attributes
• voms-proxy-initIn gLite only VOMS is used
Evolution of VO management in gLite
12
Virtual Organization Membership Service: Extends the proxy with info on VO
membership, group, roles Fully compatible with GSI Each VO has a database containing group membership, roles and capabilities
information for each user User contacts VOMS server requesting his
authorization info Server sends authorization info to the
client, which includes it in a proxy certificate
[glite-tutor] /home/giorgio > voms-proxy-init --voms gildaCannot find file or dir: /home/giorgio/.glite/vomsesYour identity: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Emidio Giorgio/[email protected] GRID pass phrase:Your proxy is valid until Mon Jan 30 23:35:51 2006Creating temporary proxy.................................DoneContacting voms.ct.infn.it:15001 [/C=IT/O=GILDA/OU=Host/L=INFN Catania/CN=voms.ct.infn.it/[email protected]] "gilda"Creating proxy ...................................... DoneYour proxy is valid until Mon Jan 30 23:35:51 2006
Query
Authentication
Request
AuthDB
C=IT/O=INFN /L=CNAF/CN=Pinco Palla/CN=proxy
VOMSAC
VOMSAC
VOMS: concepts
13
Grid Foundation: Information Systems
• Provide information about Grid resources and their status.
• This information is essential for the operation of the whole Grid, as it is via IS that the resources are discovered.
• The published information is also used for monitoring and accounting purposes.
• Two different approaches: Berkeley DB Information Index (BDII) Relational Grid Monitoring Architecture
(R-GMA)
14
Berkeley DB Information Index
SELocal GRIS
SELocal GRIS
CELocal GRIS
SELocal GRIS
CE Local GRIS
CELocal GRIS
CE Local GRIS
SELocal GRIS
CE Local GRIS
CESite GIIS
CESite GIIS
CESite GIIS
BDII-A BDII-B
User Application
WMS
Monitoring Services
• Extension of Globus MDS adopted in LCG has the information provider
• Information hierarchically distribuited, following a GLUE eschema
• MDS implements the GLUE schema following OpenLDAP
15
R-Grid Monitoring Architecture
• Based on the GMA from the Global Grid Forum (GGF, now OGF)• Provides a uniform method to access and publish distributed
information and monitoring data Used for job and infrastructure monitoring in gLite 3.0 Information provided by Publish and Consuming mechanism Appearance of a single federated DB to query with SQL (each VO as
one VDB)
16
Grid foundation: Computing Element
• A CE refer to a set of computational resources (cluster, computing farm, etc.): CE Aceptance (CEA):
generic interface to cluster. Includes the functionality of a site Gatekeeper
LRMS (batch system): Condor, OpenPBS, Torque/Maui, LSF
The cluster itself: Worker Nodes (WNs)
CE Monitor (CEMon): deals with notifications about CE status, requests jobs to WMS (pull mode)
Client
MON
LRMS
WNs
CEA JC
• For job submission, CE is able to work in pull or in push mode
• CE resoponsible to collect accounting info.
17
Grid foundation: Computing Element
• Two Gatekeeper implementations in gLite-3.0: LCG-CE:
Developed by EDG and adopted as LCG-2 CE Gatekeeper based on pre-WS GRAM (Globus Toolkit
2)
gLite-CE: Deployed for the first time en gLite-3.0 (testing) Exploits new Globus gatekeeper, but Job Manager
based on GSI enabled Condor-C More efficient, uses BLAH protocol
New!
18
• Accounting services accumulates information about resources usage done by users or group of users (VOs)
• APEL (Accouting using PBS Event Logging): Uses R-GMA to propagate and display job accounting information for infrastructure monitoring Reads LRMS log files provided by gLite-CE.
• DGAS (Distributed Grid Accounting System): Collects, stores and transfers accounting data. Compliant with privacy requirements Reads LRMS log files provided by LCG-CE. Stores information in a site database (HLR) and optionally in
a central HLR. Access granted to user, site and VO administrators
Not yet certified in gLite 3.0.
New!
Grid foundation: Accounting
19
Grid foundation: Storage Element
• A SE provides uniform access to data storage resources (disk servers, tape based MSS, etc.).
• Common Interface: Storage Resource Management System (SRM). Various implementations from LCG and other projects (DPM, CASTOR, dCache)
• GridFTP (GSI enabled FTP) is the protocol for the whole-file transfers. POSIX like client file access through GFAL libraries (Grid File Access Library).
Type Resources File Transfer Remote File I/O SRM
Classic SE Disk server GSIFTP Insercure RFIO NO
DPM Disk Pool GSIFTP Secure RFIO YES
dCache Disk Pool/MSS GSIFTP gsidcap YES
CASTOR MSS GSIFTP Insercure RFIO YES
20
High Level Services: Catalogs
• Data stored in different locations – in most cases there is no shared file system or common name space.
• File and Replica Catalogs: Keep track of the location of copies (replicas)
of Grid files Store information about data and metadata
that is being operated on in the Grid Grid Catalogs are used to manage Grid file
namespaces and location of files, to store and retrieve metadata and to keep authorization info of the files
21
• LCG File Catalog (LFC) from LCG: Store location(s) of files and replicas LFC will maps LFNs or GUIDs to SURLs of files Integrated GSI Authorization + Authentication, CLIs
and APIs: all operations require a valid proxy LFC supports Oracle and MySQL as database
backends
• AMGA Metadata Catalog: generic metadata catalogue Used mainly by Biomed Not yet certified in gLite 3.0. Certification will start
soon.
New!
High Level Services: Catalogs
22
• FTS: Reliable, scalable and customizable file transfer Scheduled file transfers – submit a job Manages transfers through channels
mono-directional network pipes between two sites
Web service interface Automatic discovery of services Support for different user and
administrative roles Adding support for pre-staging and new
proxy renewal schema In the medium term add support for
SRMv2, delegation, VOMS-aware proxy renewal
New!
High Level Services: File transfer
23
High Level Services: Workload Mgmt.
• Workload Management System (WMS) helps users accesing computing resources: Resource brokering: accepts user jobs, assigns them
to the appropriate CE, records their status and retrieves their output
Management of job input/output files The Resource Broker (RB) is the machine where the
WMS services run
• Two WMS implementations in gLite 3.0: LCG-2 RB: GT2 + CondorG (from EDG)
To be replace when gLite WMS proves to be reliable gLite WMS: Web Service (WMProxy) + CondorG
Deployed for the first time in gLite 3.0 Management of complex workflows (DAGs) and
compound jobs
New!
24
• Logging and Bookkeeping service: Tracks jobs managed by the WMS during their lifetime (in terms
of events) Collects events from many WMS components and records the
status and history of the jobs L&B API and CLI to query jobs Support for “CE reputability ranking“: maintains recent statistics
of job failures at CE’s and feeds back to WMS to aid planning
• Job Provenance: stores long term job information Supports job rerun If deployed will also
help unloading the L&B
Not yet certified in gLite 3.0.
New!
High Level Services: Job Information
25
UIJDL
Logging &Logging &Book-keepingBook-keeping
ResourceResourceBrokerBroker
Job SubmissionJob SubmissionServiceService
StorageStorageElementElement
ComputingComputingElementElement
Information Information ServiceService
Job Status
LFCLFCCatalogCatalog
DataSets info
Author.&Authen.
Job S
ub
mit
Even
t
Job
Qu
ery
Job
Stat
us
Input “sandbox”
Input “sandbox” + Broker InfoGlobus RSL
Output “sandbox”
Output “sandbox”
Job Status
Pu
blis
h
vom
s-pr
oxy-
init
Exp
and
ed J
DL
SE & CE info
Job Workflow in gLite
26
UIJDL
Logging &Logging &Book-keepingBook-keeping
ResourceResourceBrokerBroker
Job SubmissionJob SubmissionServiceService
StorageStorageElementElement
ComputingComputingElementElement
Information Information ServiceService
Job Status
LFCLFCCatalogCatalog
DataSets info
Author.&Authen.
Job S
ub
mit
Even
t
Job
Qu
ery
Job
Stat
us
Input “sandbox”
Input “sandbox” + Broker InfoGlobus RSL
Output “sandbox”
Output “sandbox”
Job Status
Pu
blis
h
vom
s-pr
oxy-
init
Exp
and
ed J
DL
SE & CE info
Job Workflow in gLite
27
Summary
• gLite is the next generation middleware for Grid Computing: gLite 3.0 is an important milestone in
EGEE-II Exploits experience and existing
components from VDT (Condor, Globus), EDG/LCG, AliEn and other projects.
Last release, gLite 3.0, includes LCG-2.7.0 Develops a ligthweight stack of generic
middleware useful to EGEE applications (HEP and biomedic are pilots applications)
New components deployed for the first time on the Production Infrastructure
Collaboration with other projects for interoperability and definition/adoption of international standards
28
• Web site: http://www.glite.org • Architecture and design documents:
http://egee-jra1.web.cern.ch/egee%2Djra1/
• General documentation: http://glite.web.cern.ch/glite/documentation/
References
29
Edificio BroncePlaza Manuel Gómez Moreno s/n28020 Madrid. España
Tel.: 91 212 76 20 / 25Fax: 91 212 76 35www.red.es
www.glite.orgQuestions?