+ All Categories
Home > Documents > Roadmap to AliEn v2-20

Roadmap to AliEn v2-20

Date post: 24-Feb-2016
Category:
Upload: jett
View: 23 times
Download: 0 times
Share this document with a friend
Description:
Roadmap to AliEn v2-20. A. Abramyan , L. Betev , D. Goyal , A. Grigoras , C. Grigoras , M. Litmaath , N . Manukyan , M. Martinez, J . Porter, P. Saiz, S. Sankar , S. Schreiner. What’s new. Plenty of new improvements Catalogue simplification Client UI Extreme Job Brokering - PowerPoint PPT Presentation
Popular Tags:
19
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/ DB ES A. Abramyan, L. Betev, D. Goyal, A. Grigoras, C. Grigoras, M. Litmaath, N. Manukyan, M. Martinez, J. Porter, P. Saiz, S. Sankar, S. Schreiner Roadmap to AliEn v2-20
Transcript
Page 1: Roadmap to  AliEn  v2-20

Experiment Support

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

DBES

A. Abramyan, L. Betev, D. Goyal, A. Grigoras, C. Grigoras, M. Litmaath, N. Manukyan,

M. Martinez, J. Porter, P. Saiz, S. Sankar, S. Schreiner

Roadmap to AliEn v2-20

Page 2: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

29 Mar 2012 Pablo Saiz ALICE offline week

• Plenty of new improvements– Catalogue simplification– Client UI– Extreme Job Brokering– Removal of PackMan – New JDL fields– Proxy renewal– Job Memory checkup

• And baseline for new development

What’s new

Page 3: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

39 Mar 2012 Pablo Saiz ALICE offline week

Catalogue Simplification

• Up to now, catalogue divided in multiple DB:– Simplifies scalibility– Logic slightly more complicated

• Changing username/userid– Smaller tables

Thanks Dushyant, Subho

Page 4: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

49 Mar 2012 Pablo Saiz ALICE offline week

PackMan

• Removing the PackMan/PackManMaster services

• Functionality stays in client UI/JA– JA can install packages directly– Very powerful if combined with torrent

• Speeds up most of the packman operations

Thanks Narine, Arm

enuhi

Page 5: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

59 Mar 2012 Pablo Saiz ALICE offline week

New JDL fields

• MaxWaitingTime: amount of time that job can stay in ‘WAITING’– If time exceeded, job ends up in error– New state: ERROR_EW (Expired Waiting)

• Retrial:– Number of times that a single job can be

resubmitted– Resubmission done by central services

• Reusing JobId in resubmission• Direct removal of KILLED jobs

Thanks Miguel

Page 6: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

69 Mar 2012 Pablo Saiz ALICE offline week

Extreme Brokering

• Postpone splitting of job until last moment• Decide data to be analyzed based on

current location of JA & files not analyzed yet

• Can define Max/Min number of files to be analyzed– Even if the files are not local

• Less subjobs:– Easier merging

Thanks Pablo

Page 7: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

79 Mar 2012 Pablo Saiz ALICE offline week

Current situation

Works nicely if one replica per file

JobManager

JOBJOB

JOBJOB

A bit more complex with 3 SE and 2 replicas

JobManager

JOB

JOB JOB

JOB

JOB

JOB

JOB

And a lot more with 50 SE and 3 replicas

JobManager

JOB

JOB JOB

JOB

JOB

JOB

JOB

JOB JOB

JOB

JOB

JOB

JOB

JOBJOB JOBJOB

Page 8: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

89 Mar 2012 Pablo Saiz ALICE offline week

Example

Site A Site B Site CFile 1

File 2

File 3

File 4

File 5

Current schemaSubmit 4 jobs:

File1File 4 File2 File3 File 5

Broker per fileSubmit 3 empty subjobs

File1,2,4,5

When a job starts, analyze as much as possible

File 3

If nothing left, just exit

Page 9: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

99 Mar 2012 Pablo Saiz ALICE offline week

Proxy renewal system

• Replaces vobox-proxy-renewal service• Can receive ‘validity’ or proxies

– Simplifies CREAM-CE job submission• No corruption of proxies• Can be started by non-root user• Already deployed at CERN

– And for some CMS sites…• Can already be deployed

Thanks Maarten

Page 10: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

109 Mar 2012 Pablo Saiz ALICE offline week

New development

• More than 1 year since last mayor update• Some backward incompatible changes

– Change of catalogue schema• What to do with new requests, bugs:

– Debug current system?– Debug in new version?– Both!

Page 11: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

119 Mar 2012 Pablo Saiz ALICE offline week

AliEn deployment for ALICE

catalogue

TaskQueue Transfers

LDAP

Central Services

Api

Api

Api

Api

aliensh

vobox

ROOT

3 machines (+1 slave, backups)

12 machines

8 machines

80 sites

3 machines (+1 slave, backups)

AliEn v2-17

12 machinesAliEn v2-19**, v2-17

8 machinesAliEn v2-19**

80 sitesAliEn v2-19.(80-163)

JA

40.000 wn40.000 wnAliEn v2-19.(80-163)

BACKUP

Page 12: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

129 Mar 2012 Pablo Saiz ALICE offline week

How to test new versions…

• Build system:– Multiple platforms– Integration & basic functionality tests

• No API/access from ROOT tests – Similar to the AliROOT, ROOT build systems– Running the whole system on a single machine– http://alienbuild.cern.ch:8888

Page 13: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

139 Mar 2012 Pablo Saiz ALICE offline week

Already deployed for PANDA

• Running since September– 12th PANDA Grid Workshop and 2nd AliEn

Developers Week• Multiple sites, smaller load than ALICE

– No API services– ‘Old’ v2.20 version

Thanks PANDA

Page 14: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

149 Mar 2012 Pablo Saiz ALICE offline week

Previous major update

• Stopping the whole system– 1 week to redeploy– 1 month ironing out details

Not an optio

n!

Page 15: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

159 Mar 2012 Pablo Saiz ALICE offline week

Second set of services:

catalogue

TaskQueue Transfers

LDAP

Central Services

ApiApiApi

Api

aliensh

CE

ROOT

JA

catalogue

TaskQueue Transfers

LDAP

Central Services

ApiApiApi

Api

aliensh

CE

ROOT

JA

Page 16: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

169 Mar 2012 Pablo Saiz ALICE offline week

Second set of services

• Copy of the catalogue• 3 different central machines, 3 voboxes,

same SE

• What to do with output– Throw away (easiest)– Incorporate back (easy if output in a different

directory)

Page 17: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

179 Mar 2012 Pablo Saiz ALICE offline week

Timeline

Now:1 week: Investigate test system1 week: Test Catalogue migration1 week: Define New VO1 week: Verify quotas

1 month:New hardware for CS2 days: Central deployment from backup3 days: First site working (CERN)2 weeks: At least 2 external sites (CCIN2P3, ?)After that works, keep adding sites

2 months:1 day: Switch VO1 day: Overall site upgrade

Mar Apr May

Page 18: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

189 Mar 2012 Pablo Saiz ALICE offline week

Summary

• AliEn v2.20 ready for deployment– With plenty of new features and bug fixes

• Minimize upgrade downtime– Create testing setup with several sites, and with

all the SE– More effort on testing (also from site admins)

• Deploy Test V0 with ALICE sites• And say goodbye to v2-19 in two months

Thank you!!

Page 19: Roadmap to  AliEn  v2-20

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

199 Mar 2012 Pablo Saiz ALICE offline week

xrootd

Job execution

JobManager

JOBTASKQUEUE

Job Broker

CEMonALISA

xrootd

Site A

JOB

MonALISAxrootd

Site BMonALISA

Site C

File catalogue

LFN GUID Metadata

JOBJOB

CE

CEJA

JA

JA


Recommended