The High Energy Physics Community Grid Project Inside D-Grid ACAT 07 Torsten Harenberg - University...

transcript

The High Energy PhysicsCommunity Grid Project

Inside D-Grid

ACAT 07Torsten Harenberg - University of Wuppertal

harenberg@physik.uni-wuppertal.de

D-Grid organisational structure

technical infrastructure

Nutzer

User API

D-Grid resources

Grid services

Core services

Distributed data services

D-Grid Services

Communities

Daten/Software

Distributed computing resources

Distributed computing resourcesnetwork

network

Security and VOmanagement

GAT API

Scheduling undWorkflow Management

Portal (GridSphere based)

UNICORE

Accounting undBilling

Data management

Globus Toolkit V4

LCG/gLiteMonitoring

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

EDG EGEE

LCG R&D WLCG Ramp-up ...

EGEE 2

HEP CG

Okt.HI run

Mar-Seppp run

EGEE 3 ?

GridKa / GGUS

HEP Grid effords since 2001

D-Grid Initiative

LHC Groups in Deutschland

Alice: Darmstadt, Frankfurt, Heidelberg, Münster

ATLAS: Berlin, Bonn, Dortmund, Dresden, Freiburg, Gießen, Heidelberg, Mainz, Mannheim, München, Siegen, Wuppertal

CMS: Aachen, Hamburg, Karlsruhe

LHCb: Heidelberg, Dortmund

German HEP institutes participating in WLCG

WLCG: Karlsruhe (GridKa & Uni), DESY, GSI, München, Aachen, Wuppertal, Münster, Dortmund, Freiburg

HEP CG participants:

Participants: Uni Dortmund, TU Dresden, LMU München, Uni Siegen, Uni Wuppertal, DESY (Hamburg & Zeuthen), GSI

Associated partners: Uni Mainz, HU Berlin, MPI f. Physik München, LRZ München, Uni Karlsruhe, MPI Heidelberg, RZ Garching, John von Neumann Institut für Computing, FZ Karlsruhe, Uni Freiburg, Konrad-Zuse-Zentrum Berlin

HEP Community Grid

WP 1: Data management (dCache)

WP 2: Job Monitoring and user support

WP 3: distributed data analysis (ganga)

==> Joint venture between physics and computer science

WP 1: Data managementcoordination: Patrick Fuhrmann

An extensible metadata catalogue for semantical data access:

Central service for gauge theory

DESY, Humboldt Uni, NIC, ZIB

A scaleable storage element:

Using dCache on multi-scale installations.

DESY, Uni Dortmund E5, FZK, Uni Freiburg

Optimized job scheduling in data intensive applications:

Data and CPU Co-scheduling

Uni Dortmund CEI & E5

WP 1: Highlights

Establishing a metadata catalogue for the gauge theory

Production service of a metadata catalogue with > 80.000 documents.

Tools to be used in conjunction with LCG data grid

Well established in international collaboration

http://www-zeuthen.desy.de/latfor/ldg/

Advancements in data management with new functionality

dCache could become quasi standard in WLCG

Good documentation and automatic installation procedure helps to provide useability for small Tier-3 installations up to Tier-1 sites.

High troughput for large data streams, optimization on quality and load of disk storage systems, giving high performant access to tape systems

dCache based scaleable storage element

dCache project well established

New since HEP CG:

Professional product management, i.e. code versioning, packaging, user support and test suits.

- single host- ~ 10 TeraBytes- Zero Maintenance

- thousands of pools- >> PB Disk Storage- >> 100 File transfers/ sec- < 2 FTEs

dCache.ORG

dCache: principle

Backend Tape Storage

Streaming Data

(gsi)FTPhttp(g)

Posix I/O

xRootdCap

Storage Control

SRMEIS

protocol Engines

dCache Controller

Managed Disk Storage

dCache.ORG

Information Prot.

dCache: connection to the Grid world

Storage Element

Firewall

IN - SITE

Compute Element

Information System

FTS Channels

gsiFtp

Storage ResourceManager Protocol

File Transfer Service

dCap/rfio/root

OUT - SITE

dCache: achieved goals

Development of the xRoot protocol for distributed analysis

Small sites: automatic installation and configuration (dCache in 10mins)

Large sites (> 1 Petabyte):

Partitioning of large systems.

Transfer optimization from / to tape systems

Automatic file replication (freely configurable)

dCache: Outlook

Current usage

7 Tier I centres with up to 900 Tbytes on disk (pre center) plus tape system. (Karlsruhe, Lyon, RAL, Amsterdam, FermiLab, Brookhaven, Nordu Grid)

~ 30 Tier II centres, including all US CMS in USA, planned for US ATLAS.

Planned usage

dCache is going to be included in the Virtual Data Toolkit (VDT) of the Open Science Grid: proposed storage element in the USA.

Planned US Tier I will break the 2 PB boundary end of the year.

HEP Community Grid

WP 2: job monitoring and user support co-ordination: Peter Mättig (Wuppertal)

Job monitoring- and resource usage visualizer

TU Dresden

Expert system classifying job failures:

Uni Wuppertal, FZK, FH Köln, FH Niederrhein

Job online steering:

Uni Siegen

Worker NodeJob Monitoring

_ monitoring sensorsJob Execution Monitoring

_ stepwise

User Application(Physics)

_ monitoring sensors

_ monitoring sensorsJob Execution Monitoring

_ stepwise

Monitoring Box_ R-GMA

User_ Browser_ Visualisation Applet_ Visualisations

_ Interactivity_ Overviews_ Details_ Timelines, Histograms

Analysis_ Web-Service_ Interface to

monitoring systemse.g. R-GMA Consumer

R -GMA_

Portal Server_ GridSphere_ Monitoring Portlet

Job monitoring- and resource usage visualizer

Integration into GridSphere

Job Execution Monitor in LCG

submitted

waiting

scheduled

running

What is goingon here ?

done (failed) done (ok)

cleared

cancelled aborted

Motivation

1000s of jobs each day in LCG

Job status unknown while running

Manual error detection: slow and difficult

GridICE, ...: service/hardware based monitoring

Conclusion

Monitor job while running

Automatical error detection needed

expert system

gLite/LCGWorkernodePre-execution test

Script monitoring

Information exchange: R-GMA

Visualization: e.g. GridSphere

Python

Experten system for classification

Integration into ATLAS

Integration into GGUS

post D-Grid I: ... ?

JEM: Job Execution Monitor

JEM - status

Monitoring part ready for use

Integration into GANGA (ATLAS/LHCb distributed analysis tool) ongoing

Connection to GGUS planned

http://www.grid.uni-wuppertal.de/jem/

HEP Community Grid

WP 3: distributed data managementCo-ordination: Peter Malzacher (GSI Darmstadt)

GANGA: distributed analysis @ ATLAS and LHCb

Ganga is an easy-to-use frontend for job definition and management

Python, IPython or GUI interface

Analysis jobs are automatically splitted into subjobs which are sent to multiple sites in the Grid

Data management for in- and output. Distributed output is collected.

Allows simple switching between testing on a local batch system and large-scale data processing on distributed resources (Grid)

Developed in the context of ATLAS and LHCb

Implemented in Python

GANGA schema

Storage

queues

manager

outputs

catalog

submit

jobsdata file splitting

myAna.C

mergingfinal analysis

PROOF schema

catalog Storage

scheduler

MASTER

PROOF query:data file list, myAna.C

final outputs

(merged)

feedbacks

DESY, DortmundDresden, Freiburg,

GSI, München,Siegen,

Wuppertal

Dortmund, Dresden, Siegen,Wuppertal, ZIB,

FH Köln,FH Niederrhein

Physics Departments Computer SciencesD-GRID: Germany‘s contribution to HEP computing:

dCache, Monitoring, distributed analysis

Effort will continue,

2008: Start of LHC data taking challenge for GRID Concept

==> new tools and developments needed

HEPCG: summary

The High Energy Physics Community Grid Project Inside D-Grid ACAT 07 Torsten Harenberg - University...

Documents