+ All Categories
Home > Documents > GRAM: Software Provider Forum Stuart Martin Computational Institute, University of Chicago & Argonne...

GRAM: Software Provider Forum Stuart Martin Computational Institute, University of Chicago & Argonne...

Date post: 29-Dec-2015
Category:
Upload: rachel-paul
View: 217 times
Download: 1 times
Share this document with a friend
21
GRAM: GRAM: Software Provider Forum Software Provider Forum Stuart Martin Stuart Martin Computational Institute, University of Computational Institute, University of Chicago Chicago & Argonne National Lab & Argonne National Lab TeraGrid 2007 TeraGrid 2007 Madison, WI Madison, WI
Transcript

GRAM: GRAM: Software Provider ForumSoftware Provider Forum

Stuart MartinStuart MartinComputational Institute, University of Computational Institute, University of

ChicagoChicago & Argonne National Lab & Argonne National Lab

TeraGrid 2007TeraGrid 2007

Madison, WIMadison, WI

2

GRAM - Basic Job GRAM - Basic Job Submission and Control ServiceSubmission and Control Service

A uniform service interface for remote job submission and control– Includes file staging and I/O

management– Includes reliability features– Supports basic Grid security

mechanisms– Asynchronous monitoring– Interfaces with local resource

managers, simplifies the job of metaschedulers/brokers

GRAM is not a scheduler.– No scheduling– No metascheduling/brokering

4

Performance ComparisonsPerformance Comparisons

5

Concurrent JobsConcurrent Jobs(as in paper)(as in paper)

Stage

In

Stage

Out

File Clean Up

Unique Job Dir

GRAM2 GRAM4

None None No No 2552 2100

1X10KB 1X10KB No No 2608 3779

1X10KB 1X10KB Yes Yes 2698 5695

Average seconds per 1000 jobsCondor-g to GRAM to Condor LRM

6

Concurrent JobsConcurrent Jobs(as will be in GT 4.0.5)(as will be in GT 4.0.5)

Stage

In

Stage

Out

File Clean Up

Unique Job Dir

GRAM2 GRAM4

None None No No 2552 2176

1X10KB 1X10KB No No 2608 2147

1X10KB 1X10KB Yes Yes 2698 2254

Average seconds per 1000 jobsCondor-g to GRAM to Condor LRM

7

Improving performance forImproving performance forstaging jobsstaging jobs

Adding local method call mechanism for general use in Java WS Core (4.0.5)– GRAM is doing this with RFT– Any service which calls another in-process service could make similar modifications for local calls and likely benefit from improved performance

Adding caching of the GridFTP server connections in RFT (4.0.6)

8

Sequential JobsSequential Jobs

Delegation

Stage

In

Stage

Out

GRAM2 GRAM4

None None None N/A 1.70

Per Job None None 1.07 3.53

Per Job 1X10KB None 1.78 5.57

Shared 1X10KB None N/A 5.41

Per Job 1X10KB 1X10KB 2.44 9.08

Shared 1X10KB 1X10KB N/A 7.91

Average seconds per job (Fork)

9

Sequential JobsSequential Jobs

Delegation

Stage

In

Stage

Out

GRAM2 GRAM4

None None None N/A 1.46

Per Job None None 1.07 3.42

Per Job 1X10KB None 1.78 3.46

Shared 1X10KB None N/A 3.51

Per Job 1X10KB 1X10KB 2.44 5.25

Shared 1X10KB 1X10KB N/A 3.67

Average seconds per job (Fork)

10

GRAM AuditingGRAM Auditing

11

TG GatewaysTG Gateways

Lower the barrier for scientists and their applications to use TeraGrid resources

Provide an application or domain-specific interface that a scientist can easily understand

Each gateway may have 100s or 1000s of users accessing TG resources

Must be efficient and scale

12

Use CasesUse Cases

Group Access– For efficiency, a “community” credential is used to multiplex many users over a single ID

Query Job Accounting– Gateways need a remote interface to obtain the TG units charged for their user’s jobs

Auditing– Grid services provide access to resources

– TG Resource Providers need a record of actions performed by services

13

Requirements From Use CasesRequirements From Use Cases

Grid Job Identifier Remote client interface to auditing and accounting information

Creation of service audit and accounting information

Access to remote LRM accounting information from the audit service

Scalability in storing information/records Secure access (authentication and authorization) to audit and accounting information

14

Grid Job IdentifierGrid Job Identifier

Uniquely identifies a job Shared between the client (Gateway) and

service (TG RP) Obtained in the normal service

interaction/protocol In GRAM4 it’s the EPR converted In GRAM2 it’s the job contact (as is)

GRAM4 Example >>>

15

GRAM4 EPR:<ns1:managedJobEndpoint xmlns:ns1=

"http://www.globus.org/namespaces/2004/10/gram/job"> <ns2:Address xmlns:ns2=

"http://schemas.xmlsoap.org/ws/2004/03/addressing">https://127.0.0.1:8443/wsrf/services/

ManagedExecutableJobService </ns2:Address>

<ns3:ReferenceProperties xmlns:ns3= "http://schemas.xmlsoap.org/ws/2004/03/addressing">

<ns1:ResourceID cca8169a-c65f-11da-a61c-000d61215ff0 </ns1:ResourceID>

</ns3:ReferenceProperties> <ns4:ReferenceParameters

xmlns:ns4="http://schemas.xmlsoap.org/ws/2004/03/addressing"/>

</ns1:managedJobEndpoint>

Grid Job ID:https://127.0.0.1:8443/wsrf/services/

ManagedExecutableJobService?QQDzjbFVYImtVg8

16

Remote Client InterfaceRemote Client Interface

Flexible query interface to retrieve audit and accounting records

Define an operation “getChargeForJob” to return the units consumed by a Grid Job ID

Keep audit service interface separate from GRAM service to allow flexible deployment scenarios– Allow a single audit service for multiple GRAM services

– Same client interface could be used for other services, for example, charging for data storage or transfers

OGSA-DAI satisfies these requirements

17

Creation of Service Auditing Creation of Service Auditing InformationInformation

Added GRAM audit record creation upon job termination– Record fields: Job_grid_id, local_job_id, submission_job_id, subject_name, username, creation_time, queued_time, stage_in_gid, stage_out_gid, clean_up_gid, gt_verison, rm_type, job_description, success_flag

– Gerson Galang (APAC) contribution for GRAM4 audit record creation at beginning of job, update after LRM submission, and final update upon termination

– Records are needed soon after job termination Accounting information is created by the local resource managers

18

Access to LRM Accounting Access to LRM Accounting Information Information

TeraGrid uploads all LRM accounting information from each TG site to a central DB (TGCDB)

The OGSA-DAI service can be configured to access the remote TGCDB

19

Scalability in Storing Scalability in Storing Information/RecordsInformation/Records

Estimated that system should handle 100,000+ records

GRAM service inserts records directly into audit DB

Audit DB must be local to GRAM service to assure reliability

Implemented to use either postgress or MySQL

20

Secure accessSecure access

Standard authentication and authorization methods should be used to limit access to the audit and accounting information– Clients must present a valid X.509 certificate

– Access can be controlled based on a range of policies

Current policy is to allow access iff the DN of the requestor matches the DN in the audit record

21

GT4 Java Container

Delegation

ResourceManager

RFT

RMAccounting

LEAD Gateway

Resource Provider Site

TG CentralAccounting

DB

RFT AuditTable

GRAM AuditTable

AMIE

OGSA DAI

WS GRAM1, 2

8

3

Compute Cluster

45

6

9

7

22

Sequence DescriptionSequence Description

1. Gateway submits job and gets an EPR on the reply

2. Gateway controls and monitors job with EPR3. GRAM submits and monitors job in RM4. GRAM inserts audit record at end of job5. RM writes job accounting record6. AMIE uploads RM accounting records to TGCDB.

The RM accounting record is converted to TG accounting units.

7. Gateway locally converts EPR to GJID8. Gateway calls OGSA-DAI getChargeForJob with

GJID and gets the job usage on the reply9. OGSA-DAI processes remote join between GRAM

audit and TGCDB


Recommended