+ All Categories
Home > Documents > 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13...

1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13...

Date post: 05-Jan-2016
Category:
Upload: beverley-ellis
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
32
1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB and not the PB to the KB
Transcript
Page 1: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

1

PROOFThe Parallel ROOT Facility

Gerardo Ganis / CERN

CHEP06, Computing in High Energy Physics13 – 17 Feb 2006, Mumbai, India

Bring the KB to the PB and not the PB to the KB

Page 2: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

2

What is this talk about

G. Ganis, CHEP06, 13 Feb 2006

End-user analysis of HEP data on distributed systems usingthe ROOT data model

ROOT ? Package providing:

Efficient data storage supporting structured data sets Efficient query system to access the information Complete set of tools for scientific analysis Advanced 2D / 3D visualization and GUI systems C++ interpreter

HEP data ? Collections of independent events

Page 3: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

3

The ROOT data model: Trees & Selectors

G. Ganis, CHEP06, 13 Feb 2006

Begin()•Create histos, …•Define output list

Process()

preselection analysis

Terminate()•Final analysis (fitting, …)

output listSelector

loop over events

OK

event

branch

branch

leaf

leafleaf

branch

leafleaf

1 2 n last

n

read neededparts only

Chain

branch

leaf leaf

Page 4: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

4

End-User Analysis scenarios

G. Ganis, CHEP06, 13 Feb 2006

Analysis performance typically I/O bound

Analysis requirements of LHC experiments (~100 users) CPU: ~ MSI2k (~ 250 Intel Core Duo) Storage: ~ 0.3-1.6 PB storage Bandwidth: ~ 100 MB/s

Intrinsic parallelism of data classically exploited by splittinglong analysis jobs in smaller sub-jobs addressing differentportions of data and submitted (run) concurrently

Farms of fewhundred nodes

Good accessto data required

Page 5: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

5

“Classic” approach

G. Ganis, CHEP06, 13 Feb 2006

StorageBatch farm

queues

manager

outputs

catalog

query

“static” use of resources jobs frozen: 1 job / worker node

“manual” splitting, merging limited monitoring (end of single job)

submit

files

jobsdata file splitting

myAna.C

mergingfinal analysis

Page 6: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

6

The PROOF approach

G. Ganis, CHEP06, 13 Feb 2006

catalog StoragePROOF farm

scheduler

query

farm perceived as extension of local PC same macro, syntax as in local session

more dynamic use of resources real time feedback automated splitting and merging

MASTER

PROOF query:data file list, myAna.C

files

final outputs(merged)

feedbacks

(merged)

Page 7: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

7

PROOF – Multi-tier Architecture

G. Ganis, CHEP06, 13 Feb 2006

good connection ?VERY importantless important

Optimize for data locality or efficient data server access

adapts to clusterof clusters orwide area virtual clusters

Geographically separated domains;

heterogenous machine types

proofd

Page 8: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

8

PROOF: ingredients

G. Ganis, CHEP06, 13 Feb 2006

PROOF servers are full-featured ROOT applications communication layer setup via light daemons ( (x)proofd )

authentication: password-based, GSI, Kerberos, … Dynamic load balancing

Pull architecture: workers ask for work when finished faster workers get more work

Merging infrastructure via Merge() implemented for standard objects: histograms, trees, … user-defined strategy by overloading / defining Merge()

Feedback at tunable frequency: Standard statistics histograms (events/packets per node, …) Temporary version of any output object

Package manager for optimized upload of additional libraries needed by the analysis

Page 9: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

9

PROOF – Scalability

G. Ganis, CHEP06, 13 Feb 2006

32 nodes: dual Itanium II 1 GHz CPU’s,2 GB RAM, 2x75 GB 15K SCSI disk,1 Fast Eth, 1 GB Eth nic (not used)

Each node has one copy of the data set(4 files, total of 277 MB), 32 nodes:8.8 Gbyte in 128 files, 9 million eventsEfficiency ~ 90 %

8.8GB, 128 files1 node: 325 s32 nodes in parallel: 12 s

Case of data locality

Page 10: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

10

PROOF: data access and scheduling issues

G. Ganis, CHEP06, 13 Feb 2006

Low latency in data access is essential Minimize file opening overhead (asynchronous open) Caching, asynchronous read-ahead of required segments Supported by xrootd

Scheduling of large numbers of users: Interface to generic resource broker to optimize the load Batch systems have this already:

concrete implementations exist for LSF, Condor planned for BQS, Sun Grid Engine, …

On the GRID, use available services to determine the session configuration

F. Furano, #368, Feb 15th, 16:20A. Hanushevsky, #407, Feb 15th, 17:00

Page 11: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

11

PROOF @ GRID

G. Ganis, CHEP06, 13 Feb 2006

PROOFPROOFMASTERMASTERSERVERSERVER

USER SESSIONUSER SESSION

Guaranteed site access throughPROOF Sub-Masters calling outto Master (agent technology)

Grid/Root Authentication

Grid Access Control Service

TGrid UI/Queue UI

Proofd Startup

GRID Service Interfaces

Grid File/Metadata Catalogue

Client retrieves listof logical files (LFN + MSN)

Slave servers access data via xrootd from local disk pools

PROOF PROOF SLAVE SLAVE SERVERSSERVERS

PROOFPROOFSUB-MASTERSUB-MASTER

SERVERSERVER

PROOF PROOF SLAVE SLAVE SERVERSSERVERS

PROOF PROOF SLAVE SLAVE SERVERSSERVERS

PROOFPROOFSUB-MASTERSUB-MASTER

SERVERSERVER

PROOFPROOFSUB-MASTERSUB-MASTER

SERVERSERVER

ROOTROOT

Demo’ed by ALICEat SC03, SC04, …

Page 12: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

12

What is our goal?

Page 13: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

13

Typical end-user job-length distribution

G. Ganis, CHEP06, 13 Feb 2006

Interactive analysis usinglocal resources, e.g.- end-analysis calculations- visualization

Analysis jobs with well defined algorithms (e.g. production of personal trees)

Medium term jobs, e.g.analysis design and development using alsonon-local resources

Goal: bring these to thesame level of perception

Page 14: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

14

Sample of analysis activity

G. Ganis, CHEP06, 13 Feb 2006

AQ1: 1s query produces a local histogram

AQ2: a 10mn query submitted to PROOF1

AQ3->AQ7: short queries

AQ8: a 10h query submitted to PROOF2

BQ1: browse results of AQ2

BQ2: browse temporary results of AQ8

BQ3->BQ6: submit 4 10mn queries to PROOF1

CQ1: Browse results of AQ8, BQ3->BQ6

Monday at 10h15ROOT sessionon my laptop

Monday at 16h25ROOT sessionon my laptop

Wednesday at 8h40

Browse from any web browser

Page 15: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

15

What do we have now?

G. Ganis, CHEP06, 13 Feb 2006

Most of the ingredients in place support for multi- sessions

asynchronous (non-blocking) running mode support for disconnect / reconnect

Use xrootd as launcher of server sessions query-results classification and management

retrieve / archive / remove Command-line controlled via TProof API …

… but also GUI controlled Virtual Demo

Page 16: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

16

A real PROOF session - connection

G. Ganis, CHEP06, 13 Feb 2006

Predefined session

Define new session

Session startup progress bar Session startup

statusready

Page 17: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

17

A real PROOF session - package manager

G. Ganis, CHEP06, 13 Feb 2006

Package tab

PAR (Proof ARchive) ROOT-INF directory, BUILD.sh, SETUP.C

Control setup of each worker

Page 18: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

18

A real PROOF session - query definition and running

G. Ganis, CHEP06, 13 Feb 2006

nameExecute to

create chain

Select chain

Choose selector

Feedback histograms

Processing information

Page 19: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

19

A real PROOF session: query browsing and finalization

G. Ganis, CHEP06, 13 Feb 2006

Details about the query

Folder with output objects

raw histogram

finalizationfinalization

Page 20: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

20

A real PROOF session: disconnection / reconnection

G. Ganis, CHEP06, 13 Feb 2006

Running sessions kept alive by server side coordinator Reconnection is much faster: no process to fork

disconnectreconnect The query is now

terminated

Page 21: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

21

A real PROOF session: chain viewer

G. Ganis, CHEP06, 13 Feb 2006

Chain viewer

Right-click

Page 22: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

22

Ongoing activities / Near Future Plans

G. Ganis, CHEP06, 13 Feb 2006

Data file upload manager optimally distribute data on the cluster storage keep track of existing data sets for optimized re-run

Dynamic cluster configuration come-and-go functionality for worker nodes olbd network to get info about the load on the cluster

Optimizations packetizer (re-assignment of being-processed packets to fast idle slaves) data access (fully exploit asynchronous features of xrootd)

Monitoring of cluster behaviour MonAlisa: allows definition of ad hoc parameters, e.g. I/O / node / query

Improve handling of error conditions identify cases hanging the system, improve error logging, … exploit olbd control network for better overview of the cluster

Testing and consolidation Documentation

Page 23: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

23

Who’s using PROOF?

G. Ganis, CHEP06, 13 Feb 2006

ALICE (see, e.g., PROOF@AliEn at CHEP04; LHCC review Nov 2005) PHOBOS CMS analysis prototype

Development test-beds Currently using CERN phased out machines:

35 dual Pentium III 800 MHz / 512 MB RAM 100 MBit/s Ethernet 600 GB total hard disk

Contact with Lyon / CC-IN2P3 (ALICE) up to 16 dual Xeon 2.8 GHz, 200 GB scratch each

Request for new test-bed at CERN (LCG / ALICE / CMS)

M. Ballantijn, #374, Feb 15th, 16:40

I. Gonzáles, #267, Feb 14th, 16:00

Page 24: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

24

People working on the project

G. Ganis, CHEP06, 13 Feb 2006

B. Bellenot, M. Biskup, R. Brun, G. Ganis, J. Iwaszkiewicz, G. Kickinger, P. Nilsson, A. Peters, F. Rademakers (CERN) M. Ballintijn, C. Loizides, C. Reed (MIT) D. Feichtinger (PSI) P. Canal (FNAL)

Page 25: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

25

The End

G. Ganis, CHEP06, 13 Feb 2006

Questions?

Links http://root.cern.ch/root/PROOF.html Discussion topic at the ROOT forum http://root.cern.ch/phpBB2/

Page 26: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

26

PROOF – backup slides

G. Ganis, CHEP06, 13 Feb 2006

Additional recent improvements Stateless connection with XrdProofd

XrdProofd basics Connection layer

Pull architecture PROOF@AliEn: command-line session

Page 27: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

27

Additional recent improvements

G. Ganis, CHEP06, 13 Feb 2006

Session startup Parallel startup with threads Optimized sequential startup Startup status and progress bar

New progressive packetizer: open files as needed continuously re-estimate # of entries can order files based on availability

Draw() and viewer functionality via PROOF

Page 28: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

28

Interactive batch: stateless connection with XrdProofd

G. Ganis, CHEP06, 13 Feb 2006

Needs coordination of alive sessions when clients disconnect existing proofd required deep re-design xrd (networking, work dispatching layer of xrootd)

generic framework to handle protocols used already to launch rootd for TNetFile clients

Dedicated protocol to launch and manage PROOF sessions forked in separated processes to protect from client bugs talking to coordinator via UNIX socket

Disconnect / reconnect handled naturally Asynchronous reading allows to setup a control interrupt network independent of OOB Xrd/olbd control network used to query status information

A.Hanushevsky, #407,

Feb 15th, 17:00

Page 29: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

29

XrdProofd basics

G. Ganis, CHEP06, 13 Feb 2006

Prototype based on XROOTD

XROOTD

links

XrdXrootdProtocol

files

MT stuff

user 1

XPD

links

XrdProofdProtocol

proofserv

XrdProofdProtocol: client gateway to proofserv static area for all client information and its activities

staticarea

MT stuff Wor

ker

serv

ers

user 1

user 2……

user 2

Page 30: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

30

clientxc

slave n

XrdProofd

XS

slave 1

XrdProofd

XS

master

XrdProofd XS

XS

xcXRD links

TXSocket

xcproofserv

xcfork()

proofslave

xc

fork() proofslave

xc

fork()

XrdProofd communication layer

G. Ganis, CHEP06, 13 Feb 2006

Page 31: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

31

Workflow: Pull Architecture

G. Ganis, CHEP06, 13 Feb 2006

dynamic load balancing naturally achieved

Page 32: 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

32

PROOF @ AliEn: command-line session

G. Ganis, CHEP06, 13 Feb 2006

TGrid: abstract interface for all services

// ConnectTGrid *alien = TGrid::Connect(“alien://”);

// QueryTString path= “/alice/cern.ch/user/p/peters/analysis/miniesd/”;TGridResult *res = alien->Query(path, ”*.root“);

// Create chain from list of filesTChain chain(“Events", “session“, res->GetFileInfoList());

// Open a PROOF sessionTProof *proof = TProof::Open(“proofmaster”);

// Process your querychain.Process(“selector.C”);


Recommended