+ All Categories
Home > Documents > 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU...

1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU...

Date post: 13-Dec-2015
Category:
Upload: dominic-franklin
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
20
Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management
Transcript
Page 1: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.20021

Data Managementon the GRID

Peter Z. KunsztCERN Database Group

EU DataGrid – Data Management

Page 2: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.20022

Personal Information

• PhD in theoretical physics (Lattice QCD) at U of Bern

• ‘Builder of the SDSS Project’ – design and implementation work on the SDSS science archive SX both Objectivity and MS SQLServer

• CERN Database Group• Activity task leader for Grid Data Management• Management of WP2 (Data Management) of the

EDG Project

Page 3: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.20023

Scope of Data Management

• Data Transfer– Transport protocols

• Data Access – Remote I/O– Security / Policies

• Data Storage– Hierarchical Storage– Mass Storage

• Replication– Peer-to-Peer – Centralized – Distributed– Automatic

• Metadata management– Scalable– Distributed– Consistent

• Persistency– Grid-enabled

databases and data stores

– Independent of back-end implementation

• Optimisation– Data Access

optimisation– Cost minimsation

Page 4: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.20024

Vision of Grid Data Management

• Distributed Shared Data Storage• Ubiquitous Data Access• Transparent Data Transfer and Migration• Consistency and Robustness• Optimisation

Page 5: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.20025

Vision of Grid Data Management

GRID

Distributed Shared Data Storage– Different architectures– Heterogenous data stores– Self-describing data and metadata

Page 6: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.20026

Vision of Grid Data Management

GRID

Ubiquitous Data Access– Global Namespace– Transparent security control and enforcement– Access from anytime anywhere, physical data location irrelevant– Automatic Data Replication and Validation

Page 7: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.20027

Vision of Grid Data Management

GRID

Transparent Data Transfer and Migration– Protocol negotiation and multiple protocol support– Management of data formats and database versions

Page 8: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.20028

Vision of Grid Data Management

GRID

Consistency and Robustness– Replicated data is reasonably up-to-date– Reliable data transfer– Self-detecting and self-correcting mechanisms upon data corruption

X

Page 9: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.20029

Vision of Grid Data Management

GRID

Optimisation– Customisation or self-adaptation to specific access patterns– Distributed Querying, Data Analysis and Data Mining

??

?!

Page 10: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200210

Grid Data Management Dependencies

PerformanceReliabilityAvailabilityUsability

MediaHardware

Operating SystemLocal File SystemNetwork Software

ProtocolsStorage System

Page 11: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200211

Existing Middleware for Grid Data Management -

Overview• Globus

– GridFTP– Replica Catalog– Replica Manager

• EU DataGrid– GDMP– Replica Catalog– Replica Manager– Spitfire

• Condor– NeST

• PPDG– Magda– JASMine– GDMP

• Griphyn/iVDGL– Virtual Data Toolkit

• Storage Resource Broker• Storage Resource

Manager• ROOT

– Alien

• Nimrod-G

Not exhaustive

Page 12: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200212

Globus Data Management

• GridFTP– Fast, parallel file transfer– Towards self-optimising system– Work on reliable file transfer on top

• Replica Catalog – jointly with EDG WP2– Configurable– Distributed, hierarchical– Scalable

• Replica Manager• Security infrastructure

Page 13: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200213

European DataGrid WP2

• GDMP – with PPDG– In production with CMS for

Objectivity replication– Subscription-based replication– Scalable architecture

• Replica Catalog with Globus• Replica Manager and Optimiser

– Take Globus RM as core– Additional modules for pre- postprocessing of data

• Replica Selection in the WP2 Optimisation task– Simulator to test replica selection

• Spitfire– Unified front-end to databases– Suitable for Grid and Application Metadata

Collective ServicesCollective Services

Information &

Monitoring

Information &

Monitoring

Replica Manager

Replica Manager

Grid Scheduler

Grid Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication and Accounting

Authorization Authentication and Accounting

Replica Catalog

Replica Catalog

Storage Element Services

Storage Element Services

SQL Database Services

SQL Database Services

Fabric servicesFabric services

ConfigurationManagement

ConfigurationManagement

Node Installation &Management

Node Installation &Management

Monitoringand

Fault Tolerance

Monitoringand

Fault Tolerance

Resource Management

Resource Management

Fabric StorageManagement

Fabric StorageManagement

Grid

Fabric

Local Computing

Grid Grid Application LayerGrid Application Layer

Data Management

Data Management

Job Management

Job Management

Metadata Management

Metadata Management

Object to File Mapping

Object to File Mapping

Service Index

Service Index

Collective ServicesCollective Services

Information &

Monitoring

Information &

Monitoring

Replica Manager

Replica Manager

Grid Scheduler

Grid Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication and Accounting

Authorization Authentication and Accounting

Replica Catalog

Replica Catalog

Storage Element Services

Storage Element Services

SQL Database Services

SQL Database Services

Fabric servicesFabric services

ConfigurationManagement

ConfigurationManagement

Node Installation &Management

Node Installation &Management

Monitoringand

Fault Tolerance

Monitoringand

Fault Tolerance

Resource Management

Resource Management

Fabric StorageManagement

Fabric StorageManagement

Grid

Fabric

Local Computing

Grid Grid Application LayerGrid Application Layer

Data Management

Data Management

Job Management

Job Management

Metadata Management

Metadata Management

Object to File Mapping

Object to File Mapping

Service Index

Service Index

Page 14: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200214

WP2 Replica Manager Architecture

Core API

Optimisation API

Replica Catalogue Metadata Catalogue

Page 15: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200215

Condor Data Management

• Condor Matchmaking– Find optimal resource

• Condor Network Storage (NeST)– Generic access to storage – abstract storage

interface– Virtual Protocol Layer– User Management and Reservation

• Chirp– Minimum set of file access

requests– Meta-management requests

• Condor Bypass

Page 16: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200216

PPDG / Griphyn Data Management

• Globus, Condor, SRB• GDMP – with EDG• Magda

– To be used in ATLAS data challenges

– Metadata catalog• JASMine JLAB Asynchronous Storage Manager

– Storage Management and Resource– Replica catalog based on MySQL, as Web

Service– Replication service– File Server

• Griphyn Virtual Data System

Application

Planner

Executor

Catalog Services

Info Services

Policy/Security

Monitoring

Repl. Mgmt.

Reliable TransferService

Compute Resource Storage Resource

DAG

DAG

DAGMAN, Kangaroo

GRAM GridFTP; GRAM; SRM

GSI , CAS

MDS

MCAT; GriPhyN catalogs

GDMP

MDS

Globus

= initial solution is operationalApplication

Planner

Executor

Catalog Services

Info Services

Policy/Security

Monitoring

Repl. Mgmt.

Reliable TransferService

Compute Resource Storage Resource

DAG

DAG

DAGMAN, Kangaroo

GRAM GridFTP; GRAM; SRM

GSI , CAS

MDS

MCAT; GriPhyN catalogs

GDMP

MDS

Globus

DAGMAN, Kangaroo

GRAM GridFTP; GRAM; SRM

GSI , CAS

MDS

MCAT; GriPhyN catalogs

GDMP

MDS

Globus

= initial solution is operational

Page 17: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200217

SRB, SRM

• SDSC Storage Resource Broker– Advanced resource techniques– Replica Catalog based on Oracle, catalog

itself is being replicated using Oracle’s replication mechanism

• Storage Resource Manager (LBNL)– Interfaces to any Storage System– Joint functional definition with EDG, PPDG,

Griphyn

Page 18: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200218

Reference Technologies

• P2P technology– Gnutella– Napster– Freenet– Oceanstore– CHORD– CAN– JXTA Search– Mojo Nation

• Database technology– Replication– Distributed

heterogeneous databases

– Query planning and optimization

• Storage– Unitree– DMF– HPSS– Castor, Enstore,

Eurostore– SAM

• File Systems– AFS, Coda, Intermezzo– NFS– GPFS, CXFS, GFS, DFS,

DAFS– SlashGrid

Page 19: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200219

Application to LCG Project

• Bridge the gap between immediate needs of experiments for production quality grid middleware and existing prototype middleware– Evolve existing grid middleware into

production quality services– LCG Project is a Deployment Grid –

nevertheless we will need to do some development

• Specialization of existing Grid Middlewareto the LHC environment – explicitly to the tiered architecture model

• Very close relations to Application Area Physics Data Management task

Tier2 Center

Online SystemOffl ine Farm

CERN Computer Center

FermilabFrance Regional Center

I taly Regional Center

UK Regional Center

I nstituteI nstituteI nstituteI nstitute

Workstations

~100 MBytes/sec

~100 MBytes/sec

.6 - 2.4 Gbits/sec

100 - 1000 Mbits/secPhysics data cache

~PBytes/sec

~2.4 Gbits/ sec

Tier2 CenterTier2 CenterTier2 Center

Tier 0Tier 0

Tier 1Tier 1

Tier 3Tier 3

Tier2 CenterTier 2Tier 2 Tier2 CenterTier2 Center

Online SystemOffl ine Farm

CERN Computer Center

FermilabFrance Regional Center

I taly Regional Center

UK Regional Center

I nstituteI nstituteI nstituteI nstitute

Workstations

~100 MBytes/sec

~100 MBytes/sec

.6 - 2.4 Gbits/sec

100 - 1000 Mbits/secPhysics data cache

~PBytes/sec

~2.4 Gbits/ sec

Tier2 CenterTier2 CenterTier2 CenterTier2 CenterTier2 CenterTier2 Center

Tier 0Tier 0

Tier 1Tier 1

Tier 3Tier 3

Tier2 CenterTier2 CenterTier 2Tier 2

AFS GDM

Page 20: 1 P.Kunszt LCGP 13.3.2002 Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

P.KunsztLCGP

13.3.200220

Issues / Dangers

• Commonalities – solving the same problems again and again ; potential for duplication of effort+ Think in Virtual Organisations+ RTAGs, like Common Persistency Framework

• Security – i can see what you can’t see+ EDG Security Group – see Dave Kelsey’s talk+ SciDAC+ Building Trust relationships

• Standardisation – bringing it all together and agree, agree, agree+ OGSA+ GGF

• Consensus – too many cooks spoil the broth+ Making decisions in time+ Keeping agreements, sticking to standards+ Avoid Micromanagement


Recommended