+ All Categories
Home > Documents > ARDA Reports to the LCG SC2 L.A.T.Bauerdick for the RTAG-11/ARDA group * Architectural Roadmap...

ARDA Reports to the LCG SC2 L.A.T.Bauerdick for the RTAG-11/ARDA group * Architectural Roadmap...

Date post: 31-Dec-2015
Category:
Upload: jocelin-garrison
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
28
QuickTime™ and aTIFF (Uncompresse ARDA Reports to the LCG SC2 L.A.T.Bauerdick for the RTAG-11/ARDA group chitectural Roadmap towards a Distributed Analy
Transcript

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

ARDA Reportsto the LCG SC2

L.A.T.Bauerdickfor the RTAG-11/ARDA group

*Architectural Roadmap towards a Distributed Analysis

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 2

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.ARDA Mandate

Mandate for the ARDA RTAG

• To review the current DA activities and to capture theirarchitectures in a consistent way

• To confront these existing projects to the HEPCAL II use casesand the user's potential work environments in order to explorepotential shortcomings.

• To consider the interfaces between Grid, LCG and experiment-specific services– Review the functionality of experiment-specific packages, state of

advancement and role in the experiment.

– Identify similar functionalities in the different packages

– Identify functionalities and components that could be integrated inthe generic GRID middleware

• To confront the current projects with critical GRID areas

• To develop a roadmap specifying wherever possible thearchitecture, the components and potential sources ofdeliverables to guide the medium term (2 year) work of the LCGand the DA planning in the experiments.

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 3

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.ARDA Schedule and Makeup

Schedule and Makeup of ARDA RTAG

The RTAG shall provide a draft report to the SC2 by September 03.

• It should contain initial guidance to the LCG and the experimentsto inform the September LHCC manpower review, in particular onthe expected responsibilities of– The experiment projects

– The LCG (Development and interfacing work rather than coordinationwork)

– The external projects

The final RTAG report is expected for October 03.

The RTAG shall be composed of

• Two members from each experiment

• Representatives of the LCG GTA and AA

• If not included above, the RTAG shall co-opt or inviterepresentatives from the major Distributed Analysis projects andnon-LHC running experiments with DA experience.

• Alice: Fons Rademakers and Predrag Buncic• Atlas: Roger Jones and Rob Gardner• CMS: Lothar Bauerdick and Lucia Silvestris • LHCb: Philippe Charpentier and Andrei Tsaregorodtsev

• LCG GTA: David Foster, stand-in Massimo Lamanna• LCG AA: Torre Wenaus• GAG: Federico Carminati

Done

Done

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 4

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.ARDA mode of operationThank you for an excellent committee -- large expertise, agility and

responsiveness, very constructive and open-minded, and sacrificing quite a bit of the summer Series of weekly meetings July and August, mini-workshop in

September Invited talks from existing experiment’s projects:

• Summary of Caltech GAE workshop (Torre)• PROOF (Fons)• AliEn (Predrag)• DIAL (David Adams)• GAE and Clarens (Conrad Steenberg)• Ganga (Pere Mato)• Dirac (Andrei)

Cross-check w/ other projects of emerging ARDA decomposition of services• Magda, DIAL -- Torre, Rob• EDG, NorduGrid -- Andrei, Massimo• SAM, MCRunjob -- Roger, Lothar• BOSS, MCRunob -- Lucia, Lothar• Clarens, GAE -- Lucia, Lothar• Ganga -- Rob, Torre• PROOF -- Fons• AliEn -- Predrag• DIRAC -- Andrei• VOX -- Lothar

Done

Done

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 5

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Initial Picture Distributed Analysis (Torre, Caltech w/s)

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 6

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Hepcal-II Analysis Use CasesScenarios based on GAG HEPCAL-II report

Determine data sets and eventually event components

•Input data are selected via a query to a metadata catalogue Perform iterative analysis activity

•Selection and algorithm are passed to a workload management system, together with spec of the execution environment

•Algorithms are executed on one or many nodes

•User monitors progress of job execution

•Results are gathered together and passed back to the job owner

•Resulting datasets can be published to be accessible to other users

Specific requirements from Hepcal-II Job traceability, provenance, logbooks Also discussed: support for finer-grain access control and

enabling to share data within physics groups

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 7

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Analysis Scenario

This scenario represents the analysis activity from the user perspective. However, some other actions are done behind the scene of the user interface: To carry out the analysis tasks users are accessing shared

computing resources. To do so, they must be registered with their Virtual Organization (VO), authenticated and their actions must be authorized according to their roles within the VO

The user specifies the necessary execution environment (software packages, databases, system requirements, etc) and the system insures it on the execution node. In particular, the necessary environment can be installed according to the needs of a particular job

The execution of the user job may trigger transfers of various datasets between a user interface computer, execution nodes and storage elements. These transfers are transparent for the user

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 8

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Example: Asynchronous Analysis

Running Grid-based analysis from inside ROOT (adapted from AliEn example)●ROOT calling the ARDA API from the command prompt

// connect + authenticate to the GRID Service arda as “lucia”TGrid *arda = TGrid::Connect(”arda",”lucia”,"","");

// create a new analysis Object ( <unique ID>, <title>, #subjobs)TArdaAnalysis* analysis = new TArdaAnalysis(“pass001",“MyAnalysis",10);

// set the program, which executes the Analysis Macro/Scriptanalysis->Exec("ArdaRoot.sh”,"file:/home/vincenzo/test.C"); // script to execute

// setup the event metadata queryanalysis->Query("2003-09/V6.08.Rev.04/00110/%gjetmet.root?pt>0.2");

// specify job splitting and runanalysis->OutputFileAutoMerge(true); // merge all produced .root filesanalysis->Split(); // split the task in subjobsanalysis->Run(); // submit all subjobs to the ARDA queue

// asynchronously, at any time get the (partial or complete) resultsanalysis->GetResults(); // download partial/final results and merge themanalysis->Info(); // display job information

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 9

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Asynchronous Analysis Model

Extract a subset of the datasets from the virtual file catalogue using metadata conditions provided by the user.

Split the tasks according to the location of data sets. A trade-off has to be found between best use of available resources

and minimal data movements. Ideally jobs should be executed where the data are stored. Since one cannot expect a uniform storage location distribution for every subset of data, the analysis framework has to negotiate with dedicated Grid services the balancing between local data access and data replication.

Spawn sub-jobs and submit to Workload Management with precise job descriptions

User can control the results while and after data are processed

Collect and Merge available results from all terminated sub-jobs on request

Analysis objects associated with the analysis task remains persistent in the Grid environment so the user can go offline and reload an analysis task at a later date, check the status, merge current results or resubmit the same task with modified analysis code.

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 10

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Synchronous Analysis

Scenario: using PROOF in the Grid environment Parallel ROOT Facility, main developer Maarten

Ballintjin/MIT PROOF already provides a ROOT-based framework to use

a (local) cluster computing resources

•balancing dynamically the workload, with the goal of optimizing CPU exploitation and minimizing data transfers

•makes use of the inherent parallelism in event data

•works in heterogeneous clusters with distributed storage

Extend this to the Grid using interactive analysis services, that could be based on the ARDA services

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 11

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

ARDA Roadmap “Informed” By DA Implementations

Following SC2 advice, reviewed major existing DA projectsClearly AliEn today provides the most complete implementation of a

distributed analysis services, that is “fully functional” -- also interfaces to PROOF Implements the major Hepcal-II use cases Presents a clean API to experiments application, Web portals, … Should address most requirements for upcoming experiment’s

physics studies• Existing and fully functional interface to complete analysis

package --- ROOT• Interface to PROOF cluster-based interactive analysis system• Interfaces to any other system well defined and certainly

feasible Based on Web-services, with global (federated) database to give

state and persistency to the systemARDA approach:

Re-factoring AliEn, using the experience of the other project, to generalize it in an architecture; Consider OGSI as a natural foundation for that

Confront ARDA services with existing projects (notably EDG, SAM, Dirac, etc)

Synthesize service definition, defining their contracts and behavior Blueprint for initial distributed analysis service infrastructure

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 12

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.ARDA Distributed Analysis Services

Distributed Analysis in a Grid Services based architecture ARDA Services should be OGSI compliant -- built upon OGSI middleware Frameworks and applications use ARDA API

with bindings to C++, Java, Python, PERL, …

• interface through UI/API factory -- authentication, persistent “session” Fabric Interface to resources through CE, SE services

• job description language, based on Condor ClassAds and matchmaking

Database(ses) through Dbase Proxy provide statefulness and persistence

We arrived at a decomposition into the following key services API and User Interface Authentication, Authorization, Accounting and Auditing services Workload Management and Data Management services File and (event) Metadata Catalogues Information service Grid and Job Monitoring services Storage Element and Computing Element services Package Manager and Job Provenance services

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 13

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

API

User Interface Factory

Auditing

DBD/RDBMS

Registry/Lookup/Config

V.O. directory

Authentication

Storage Element

Gatekeeper

Job Manager

Transfer Manager

File Transfer

Process Monitor

Transfer Broker

Job Broker

Job Optimizer

Transfer Optimizer

Catalogue Optimiser

User Interface

Grid Monitoring

CE

1

1..n1

1

0..n

1..n

1

1

1

1

11

1

1

1

0..n

0..n

0..n

0..n

0..n

1. lookup

2. authenticate

3. register

4. bind

Authorisation

File Catalogue

Metadata Catalogue

Task Queue

DB PRoxy

1

1

1

1

Package Manager

Job Provenance

1

Authorisation

Accounting

111111111111111

1

1

AliEn (re-factored)

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 14

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

ARDA Key Services for Distributed Analysis

Information Service

Authentication

Authorisation

Auditing

Grid Monitoring

Workload

Management

Metadata Catalogue

File Catalogue

Data Management

Computing Element

Storage Element

Job

Monitor

Job Provenance

Package Manager

DB Proxy

User Interface

API

Accounting

7:

12:

5:

13:

8:

15: 11:

9: 10:

1:

4:

2:

3:

6:

14:

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 15

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

ARDA HEPCAL matching: an example

HEPCAL-II Use Case: Group Level Analysis (GLA)

• User specifies job information including• Selection criteria;

• Metadata Dataset (input);

• Information about s/w (library) and configuration versions

• Output AOD and/or TAG Dataset (typical);

• Program to be run;

• User submits job;

• Program is run;

• Selection Criteria are used for a query on the Metadata Dataset;

• Event ID satisfying the selection criteria and Logical Dataset Name of corresponding Datasets are retrieved;

• Input Datasets are accessed;

• Events are read;

• Algorithm (program) is applied to the events;

• Output Datasets are uploaded;

• Experiment Metadata is updated;

• Report summarizing the output of the jobs is prepared for the group (eg. how many evts to which stream, ...) extracting the information from the application and GridMW

•Authentication•Authorization•Metadata catalog

•Workload management

•Metadata catalog

•File catalog•Data management•Storage Element

•Package manager•Compute element

•File catalog•Data management•Storage Element

•Metadata catalog•Job provenance

•Auditing

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 16

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.API to Grid servicesARDA services present an API, called by applications

like the experiments frameworks, interactive analysis packages, Grid portals, Grid shells, etc

In particular the importance of UI/API Interface services to higher level software

• Exp. framework

• Analysis shells, e.g. ROOT

• Grid portals and other forms of user interactions with environment

• Advanced services e.g. virtual data, analysis logbooks etc Provide experiment specific services

• Data and Metadata management systems

Provide an API that others can project against Benefits of common API to framework

• Goes beyond “traditional” UIs à la GANGA, Grid portals, etc Benefits in interfacing to analysis applications like ROOT et al Process to get a common API b/w experiments --> prototype

The UI/API can use the Condor ClassAds as a Job Description Language

• This will maintain compatibility with existing job execution services, in particular LCG-1.

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 17

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.API and User Interface

API

+ Authentication+ Data Management

+ Grid Service Management+ Job Control

+ Metadata Management+ NewInterface

+ Posix I/O

SOAP(from API)

Grid File Access

(from API)

Experiment Frameworks

POOL/ROOT/...

(from Experiment Frameworks)...)

API (OGSI User Interface Factory)

Storage Element (POSIX I/O service)

Portals

Grid Shells

Grid File System

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 18

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

File Catalogue and Data Management

Input and output associated with any job can be registered in the VO’s File Catalogue, a virtual file system in which a logical name is assigned to a file. Unlike real file systems, the File Catalogue does not own the

files; it only keeps an association between the Logical File Name (LFN) and (possibly more than one) Physical File Names (PFN) on a real file or mass storage system. PFNs describe the physical location of the files and include the name of the Storage Element and the path to the local file.

The system should support file replication and caching and will use file location information when it comes to scheduling jobs for execution.

The directories and files in the File Catalogue have privileges for owner, group and the world. This means that every user can have exclusive read and write privileges for his portion of the logical file namespace (home directory).

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 19

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Job Provenance service

The File Catalogue is extended to include information about running processes in the system (in analogy with the /proc directory on Linux systems) and to support virtual data services Each job sent for execution gets an unique id and a

corresponding /proc/id directory where it can register temporary files, standard input and output as well as all job products. In a typical production scenario, only after a separate process has verified the output, the job products will be renamed and registered in their final destination in the File Catalogue. The entries (LFNs) in the File Catalogue have an immutable unique file id attribute that is required to support long references (for instance in ROOT) and symbolic links.

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 20

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Package Manager Service

Allows dynamic installation of application software released by the VO (e.g. the experiment or a physics group).

Each VO can provide the Packages and Commands that can be subsequently executed. Once the corresponding files with bundled executables and libraries are published in the File Catalogue and registered, the Package Manager will install them automatically as soon as a job becomes eligible to run on a site whose policy accepts these jobs.

While installing the package in a shared package repository, the Package Manager will resolve the dependencies on other packages and, taking into account package versions, install them as well. This means that old versions of packages can be safely removed from the shared repository and, if these are needed again at some point later, they will be re-installed automatically by the system. This provides a convenient and automated way to distribute the experiment specific software across the Grid and assures accountability in the long term.

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 21

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Computing Element

Computing Element is a service representing a computing resource. Its interface should allow submission of a job to be executed on the underlying computing facility, access to the job status information as well as high level job manipulation commands. The interface should also provide access to the dynamic status of the computing resource like its available capacity, load, number of waiting and running jobs.

This service should be available on a per VO basis.

Etc pp.

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 22

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Talking PointsHorizontally structured system of services with a well-defined API and a database

backend Can easily be extended with additional services, new implementations can be

moved in, alternative approaches tested and commissionedInterface to LCG-1 infrastructure

VDT/EDG interface through CE, SE and the use of JDL, compatible with existing i/s ARDA VO services can build on emerging VO management infrastructure

ARDA initially looked at file based datasets, not object collection talk with POOL how to extend the file concept to a more generic collection

concept investigate experiment’s metadata/file catalog interaction

VO system and site security Jobs are executed on behalf of VO, however users fully traceable

How do policies get implemented, e.g. analysis priorities, MoU contributions etc Auditing and accounting system, priorities through special “optimizers” accounting of site “contributions”, that depend what resources sites “expose”

Database backend for the prototype Address latency, stability and scalability issues up-front; good experience exists In a sense, the system is the database (possibly federated and distributed) that

contains all there is to know about all jobs, files, metadata, algorithms of all users within a VO

set of OGSI grid services provide “windows”/”views” into the database, while the API provides the user access

allows structuring into federated grids and “dynamic workspaces”

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 23

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.General ARDA RoadmapEmerging picture of “waypoints” on the ARDA

“roadmap” ARDA RTAG report

• review of existing projects, common architecture component’a decomposition & re-factoring

• recommendations for a prototypical architecture and definition of prototypical functionality and a development strategy

Development of a prototype and first release

• Integration with and deployment on LCG-1 resources and services

Re-engineering of prototypical ARDA services, as required

OGSI gives framework in which to run ARDA services Addresses architecture Provides framework for advanced interactions with the Grid Need to address issues of OGSI performance and scalability up-

front

• Importance of modeling, plan for scaling up, engineering of underlying services infrastructure

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 24

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Roadmap to a GS Architecture for the LHC

Transition to grid services explicitly addressed in several existing projects Clarens and Caltech GAE, MonaLisa

•Based on web services for communication, Jini-based agent architecture

Dirac

•Based on “intelligent agents” working within batch environments

AliEn

•Based on web services and communication to distributed database backend

DIAL

•OGSA interfaces Initial work on OGSA within LCG-GTA

•GT3 prototyping

Leverage from experience gained in Grid M/W R&D projects

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 25

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.ARDA Roadmap for PrototypeNo “evolutionary” path from GT2-based grids

Recommendation: build early a prototype based on refactoring existing implementations

Prototype provides the initial blueprint Do not aim for a full specification of all the interfaces

4-prong approach: Re-factoring of AliEn, Dirac and possibly other services into ARDA

• Initial release with OGSI::Lite/GT3 proxy, consolidation of API, release

• Implementation of agreed interfaces, testing, release GT3 modeling and testing, ev. quality assurance Interfacing to LCG-AA software like POOL, analysis shells like ROOT

•Also opportunity to “early” interfacing to complementary projects

Interfacing to experiments frameworks

•metadata handlers, experiment specific servicesProvide interaction points with community

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 26

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Experiments and LCG Involved in Prototyping

ARDA prototype would define the initial set of services and their interfaces. Timescale: spring 2004

Important to involve experiments and LCG at the right level Initial modeling of GT3-based services Interface to major cross-exp packages: POOL, ROOT, PROOF,

others Program experiment frameworks against ARDA API, integrate with

experiment environments Expose services and UI/API to other LHC projects to allow synergies Spend appropriate effort to document, package, release, deploy

After the prototype is delivered, improve on Scale up and re-engineer as needed: OGSI, databases, information

services Deployment and interfaces to site and grid operations, VO

management etc Build higher-level services and experiment specific functionality Work on interactive analysis interfaces and new functionalities

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 27

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Major Role for Middleware Engineering

ARDA roadmap based on a well-factored prototype implementation that allows evolutionary development into a complete system that evolves to the full LHC scale

ARDA prototype would be pretty lightweight

• Stability through basing on global database to which services talk through a database proxy

• “people know how to do large databases” -- well founded principle (see e.g. SAM for RunII), with many possible migration paths

•HEP-specific services, however based on generic OGSI-compliant services

Expect LCG/EGEE middleware effort to play major role to evolve this foundation, concepts and implementation re-casting the (HEP-specific event-data analysis oriented) services

into more general services, from which the ARDA services would be derived

addressing major issues like a solid OGSI foundation, robustness, resilience, fault recovery, operation and debugging

Expect US middleware projects to be involved in this!

LATBauerdick/Fermilab ARDA Reports – GriPhyN Applications Meeting, ANL

Oct 15, 2003 28

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Conclusions

ARDA is identifying a services oriented architecture and an initial decomposition of services required for distributed analysis

Recognize a central role for a Grid API which provides a factory of user interfaces for experiment frameworks, applications, portals, etc

ARDA Prototype would provide an distributed physics analysis environment of distributed experimental data for experiment framework based analysis

•Cobra, Athena, Gaudi, AliRoot, for ROOT based analysis interfacing to other analysis packages like JAS; event

displays like Iguana; grid portals; etc. can be implemented easily


Recommended