+ All Categories
Home > Documents > SFT Group Review: Additional projects, future directions and overall planning SPI project (WP8)...

SFT Group Review: Additional projects, future directions and overall planning SPI project (WP8)...

Date post: 03-Jan-2016
Category:
Upload: rosemary-brooks
View: 220 times
Download: 2 times
Share this document with a friend
Popular Tags:
36
SFT Group Review: Additional projects, future directions and overall planning SPI project (WP8) Multi-core (WP9) Virtualization Other projects Vision Planning September, 30th 2009 1
Transcript

SFT Group Review:Additional projects,

future directions and overall planning SPI project (WP8) Multi-core (WP9) Virtualization Other projects Vision Planning

September, 30th 2009 1

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

SPI

September, 30th 2009 2

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Software Process and Infrastructure External libraries service

LHC experiments use about 100 libraries (open-source and public domain libraries) – see http://lcgsoft.cern.ch/

Automated building/distribution of all packages from sources for all the AA supported platforms – fast

Recently introduced new compilers and OSs (slc5, gcc 4.3, VS9, icc11) Release management of AA software stack for LHC experiments

Coordination done via the "Librarians and Integrators meeting" (low level) and "Architects Forum" (high level)

Last year we have been deploying two major release series (LCG 55 and 56) with 3 bug-fix releases on top for each series (a-c)

Moving to new compilers (e.g. gcc 4.3 for slc5) Optionally releasing parts of ROOT separately Release infrastructure also used by outside LHC experiments

(DayaBay, Memphis, Dusel)

September, 30th 2009 3

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

External softwareExternal software

Python

BoostQt

Xerces

GSLvalgrind

Grid… 100

packages

AA projectsAA projects ROOT

POOL COOL CORAL

RELAX

Xgcc 4.0icc 11

gcc 3.4gcc 4.3llvm 2.4

vc 7.1

vc 9

32 bit

64 bit

Com

mon

sof

twar

e

Java

LHC Experiment Software

LHC Experiment Software

AliRoot

CMSSW

LHCb /Gaudi

Atlas /Athena

Mac OSX (10.5)

Linux (slc4, slc5)

Windows (XP)The LHC Software Stack

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Software Process and Infrastructure (2) Nightly build, testing and integration service

Nightly builds are building/testing all software on all AA provided platforms (Scientific Linux, Windows, MacOS)

Currently extending to new compiler suites (icc) for improving software robustness and moving forward to Mac OSX 10.6

Nightlies are used in "chains" with LHC experiments building on top Fast feedback loop about changes in AA software Currently integrating nightly builds with CernVM - almost finished

Collaborative tools (HyperNews, Savannah) Savannah is highly used in LHC and outside (CERN/IT, Grid, etc.) HyperNews service has been migrated to e-groups/Sharepoint for all

LHC experiments but CMS New effort for AA wide web infrastructure based on Drupal (uniform

look and feel, better integration with each other) Infrastructure in general for the rest of the group

September, 30th 2009 5

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Savannah Usage

September, 30th 2009 6

0

200

400

600

800

1000

1200

1400

1600

Oct-0

7

Dec-0

7

Feb-

08

Apr-0

8

Jun-

08

Aug-0

8

Oct-0

8

Dec-0

8

Feb-

09

Apr-0

9

Jun-

09

Aug-0

9

New Bugs

New Tasks

Postings per month

0500

100015002000250030003500400045005000

Oct-0

7

Dec-0

7

Feb-

08

Apr-0

8

Jun-

08

Aug-0

8

Oct-0

8

Dec-0

8

Feb-

09

Apr-0

9

Jun-

09

Aug-0

9

Registered users

02000400060008000

100001200014000160001800020000

Oct

-07

Dec-0

7

Feb-0

8

Apr-0

8

Jun-

08

Aug-0

8

Oct

-08

Dec-0

8

Feb-0

9

Apr-0

9

Jun-

09

Aug-0

9

ALICE

ATLAS

CMS

LHCB

CERNIT

EGEE

LCGAA

LHCGRID

OTHER

Bugs per experiment/project types

0

50

100

150

200

250

300

350

Registered projects

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

MULTI-CORE R&D (WP8)

September, 30th 2009 7

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Main Goals

Investigate software solutions to efficiently exploit the new multi-core architectures of modern computers that experiments will need to solve Memory, which will get worse as we go to higher luminosities CPU efficiency to keep up with computational needs

Ongoing investigations covering four areas: Parallelization at event level using multiple processes Parallelization at event level using multiple threads High granularity parallelization of algorithms Optimization of memory access to adapt to new memory hierarchy

Collaboration established with LHC experiments, Geant4, ROOT and OpenLab (CERN/IT)

September, 30th 2009 8

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Current Activities Close interaction with experiments (bi-weekly meetings, reports in AF) Workshops each six months (latest in June with IT on “deployment”) Training and working sessions in collaboration with OpenLab and Intel ATLAS has developed a prototype using fork & COW CMS is investigating the same route and the use of OpenMP in

algorithms Gaudi team is investigating parallelization at Python level Geant4 has developed a multi-thread prototype ROOT is developing PROOF-light and use of multi-threads in the I/O A parallel version of Minuit (using OpenMP and MPI) already released in

ROOT 5.24 A library to use shared memory has been developed in the project

September, 30th 2009 9

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Parallelization of Gaudi Framework

September, 30th 2009 10

•No change needed in user code or configuration•Equivalent output (un-ordered events)

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Exploit Copy on Write (COW)

Modern OS share read-only pages among processes dynamically A memory page is copied and made private to a process only

when modified

Prototype in Atlas and LHCb Encouraging results as memory sharing is concerned (50%

shared) Concerns about I/O

(need to merge outputfrom multiple processes)

11September, 30th 2009

Memory (ATLAS)One process: 700MB VMem and 420MB RSSCOW:(before) evt 0: private: 004 MB | shared: 310 MB(before) evt 1: private: 235 MB | shared: 265 MB. . .(before) evt50: private: 250 MB | shared: 263 MB

See Sebastien Binet’s talk @ CHEP09

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Exploit “Kernel Shared Memory” KSM is a linux driver that allows dynamically sharing identical

memory pages between one or more processes. It has been developed as a backend of KVM to help memory sharing

between virtual machines running on the same host. KSM scans just memory that was registered with it. Essentially this

means that each memory allocation, sensible to be shared, need to be followed by a call to a registry function.

CMS reconstruction of real data (Cosmics with full detector) No code change 400MB private data; 250MB shared data; 130MB shared code

ATLAS No code change In a Reconstruction job of 1.6GB VM, up to 1GB can be shared with

KSM

12September, 30th 2009

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Parallel MINUIT Minimization of Maximum Likelihood or χ2 requires iterative

computation of the gradient of the NLL function

Execution time scales with number θ free parameters and the number N of input events in the fit

Two strategies for the parallelization of the gradient and NLL calculation: Gradient or NLL calculation on

the same multi-cores node (OpenMP) Distribute Gradient on different

nodes (MPI) and parallelize NLL calculation on each multi-cores node (pthreads): hybrid solution

13September, 30th 2009

Alfio Lazzaro and Lorenzo Moneta

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Multi-Core R&D: Outlook Recent progress shows that we shall be able to exploit next

generation multi-core with “small” changes to HEP code Exploit copy-on-write (COW) in multi-processing (MP) Develop an affordable solution for the sharing of the output file Leverage Geant4 experience to explore multi-thread (MT) solutions

Continue optimization of memory hierarchy usage Study data and code “locality” including “core-affinity”

Expand Minuit experience to other areas of “final” data analysis, such as machine learning techniques Investigating the possibility to use GPUs and custom FPGAs

“Learn” how to run MT/MP jobs on the grid workshop at CERN, June 25th-26th:

http://indico.cern.ch/conferenceDisplay.py?confId=56353 Collaboration established with CERN/IT and LCG

14September, 30th 2009

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Explore new Frontier of parallel computing Scaling to many-core processors (96-core processors foreseen for

next year) will require innovative solutions MP and MT beyond event level Fine grain parallelism (OpenCL, custom solutions?) Parallel I/O

WP8 is continuously transferring technologies and artifacts to the experiments: it will allow LHC experiments to best use current computing resources

Computing technology will continue its route toward more and more parallelism: a continuous investment in following the technology and provide solutions adequate to the most modern architectures is instrumental to best exploit the computing infrastructure needed for the future of CERN

15September, 30th 2009

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

VIRTUALIZATION R&D (WP9)

September, 30th 2009 16

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Problem Software @ LHC

Millions of lines of code Different packaging and software distribution models Complicated software installation/update/configuration procedure Long and slow validation and certification process Very difficult to roll out major OS upgrade (SLC4 -> SLC5) Additional constraints imposed by the grid middleware

development Effectively locked on one Linux flavour Whole process is focused on middleware and not on applications

How to effectively harvest multi and many core CPU power of user laptops/desktops if LHC applications cannot run in such environment?

September, 30th 2009 17

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Virtualization R&D Project

Aims to provide a complete, portable and easy to configure user environment for developing and running LHC data analysis locally and on the Grid Code check-out, edition, compilation, local small test, debugging,

… Grid submission, data access… Event displays, interactive data analysis No user installation of software required suspend/resume capability

Independent of physical software and hardware platform (Linux, Windows, MacOS)

September, 30th 2009 18

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Virtualizing LHC applications

Starting from experiment software…

…ending with a custom Linux specialised for a given task

September, 30th 2009 19

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Developing CernVM

Quick development cycles In close collaboration with experiments Very good feedback from enthusiastic

users Planning presented at kickoff Workshop

First release ahead of plan (one year ago)

J F M A M J J A S O N D

Release 1.2

- Release 1.3.3

2nd Workshop

- Release 1.3.4

CSC2009

Release 1.4.0

Release 1.6.0 (Final)

Release 2.0 (SL5)

2009

J F M A M J J A S O N D

Preparation Release 0.5

- Release 0.6

On time!

Kickoff Workshop

- Release 0.8

- Release 0.7

- Release 0.91 (RC1)

-Release 0.92 (RC2)

2008

Semi production operation in 2009 Stable and Development branches

Dissemination 2nd Workshop (organized with IT) ACAT, HEPIX, CHEP09 CERN School of Computing 2009 Tutorials and presentations to experiments

September, 30th 2009 20

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

~1200 different IP addresses

Where are CernVM users?

September, 30th 2009 21

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0.1 0.2 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2CernVM version

# of downloads

45%

17%

7%

28%

3%

ATLAS ALICE CMS LHCB LCD

Download & Usage statistics

September, 30th 2009 22

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Transition from R&D to Service

September, 30th 2009 23

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

CernVM Infrastructure

We have developed, deployed and operating highly available service infrastructure to support 4 LHC Experiments and LCD

If CernVM is to be transformed to proper 24x7 service, this will have to be moved to IT premises

An opportunity to transfer know how and perhaps start collaborating on these issues

September, 30th 2009 24

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Continuing R&D Working on scalability and performance

improvements of CVMFS P2P on LAN, CDN on WAN

SLC5 compatibility Will be addressed in CernVM 2.0

CernVM as job hosting environment Ideally, users would like to run their applications

on the grid (or cloud) infrastructure in exactly the same conditions in which they were originally developed

CernVM already provides development environment and can be deployed on cloud (EC2)

One image supports all four LHC experiments Easily extensible to other communities

September, 30th 2009 25

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Multi-core and Virtualization Workshop Last June we had a workshop on adapting applications

and computing services to multi-core and virtualization Organized in conjunction with IT Participation from vendors, experiments and CERN-IT and Grid

service providers

The goals were Familiarize with latest technologies and the current industry trends Understand experiment applications with the new requirements

introduced with virtualization and multi‐core Initial exploration of solutions

A number of follow up actions identified Discussed them at the IT Physics Services meeting Following them up

September, 30th 2009 26

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

FUTURE DIRECTIONS

September, 30th 2009 27

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Projects Summary The SFT activities can be summarized as

Development, testing and validating common software packages Software services for the experiments User support and consultancy Provide people for certain roles in experiments Technology tracking and development

Listening and responding to requests from experiments AF and other communications channels (e.g. LIM and AA meetings,

Collaboration meetings, etc.) Direct participation to the experiments

Follow technology trends and test new ideas Anticipate future needs from experiments Must keep up-to-date with this rapid changing field

September, 30th 2009 28

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Main Challenges

Coping with the reduction of manpower We are [have been] forced to do more with less people

Increase convergence between the projects To be more efficient and give more coherent view to our clients Some successes for far: nightlies, testing, savannah, web tools, etc. Many more things can be done but require temporary extra effort

Incorporate new experiments/projects Minimal critical mass needed for each new activity

Keep motivation during maintenance phase Spending more time on user support and maintenance than new

developments

September, 30th 2009 29

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

SFT evolution (not revolution) Support new experiments/projects (e.g. LCD, NA62)

Take into account possible new requirements to evolve some of the software packages

Provide people for certain roles ( e.g. Software Architects, Librarians, Coordinators, etc.) to new the projects to leverage from the expertise of the group

The [AA] LCG structure has served well until now can continue with minor modifications Incorporate the new activities under the same structure The monitoring and review process should also include the new activities

Regrouping the AA activities in a single group Organize, manage, distribute all the software packages as the “new

CERN Library” Hosting specialists on MC and Data Analysis domains

September, 30th 2009 30

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Regrouping the AA activities

Something we could follow-up is the idea of re-grouping the AA activities to the SFT group

Currently the development/maintenance of the Persistency Framework packages (POOL, CORAL, COOL) are hosted in IT-DM group (~3 FTE) The rational has been to be closer to the Physics DB services The level of integration of these developments to the rest of the

packages (e.g. ROOT) has suffered in general

Benefits The activity can leverage from the expertise in the group Better integration with the rest of AA projects

September, 30th 2009 31

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

The New CERN Library

Provide a coherent and fairly complete set of software packages ready to be used by any existing and new HEP experiment Utility libraries, analysis tools, MC generators, full and fast

simulation, reconstruction algorithms, etc. Reference place to look for some needed functionality Made from internally and externally developed products

[fairly] independent packages including some integrating elements such as ‘dictionaries’, ‘standard interfaces’, etc. Support for I/O, interactivity, scripting, etc.

Easy to configure, deploy and use by end-users Good packaging and deployment tools will be essential

September, 30th 2009 32

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Hosting Specialists

The SFT [CERN] does not have experts on many scientific or technical domains that can contribute to ‘content’ of the new CERN Library Probably out of question to hire people like Fred James

We can overcome this by inviting/offering short visits or PJAS/PDAS contracts E.g. Geant4 model development mainly carried out by PJAS E.g. Torbjorn Sjostrand hosted by SFT during development of

Pythia8

We just need good contacts and some additional exploitation budget

September, 30th 2009 33

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Manpower Plans In depth planning exercise

prepared in 2005 Still valid for baseline program

Does not take into account the R&D and new possible commitments

Ideal level of resources needed by project (baseline) Simulation: 5 STAF, 1 FELL, 1

TECH, 2 PJAS. 1 PDAS ROOT: 6 STAF, 1 FELL, 1 TECH,

1 PJAS SPI: 2 STAF, 1 FELL

September, 30th 2009 34

SFT FTEs by Project

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Year

FT

Es

LCG:SPI

LCG:CORE

LCG:SIMU

LCG:MGT

Grand Total

PH/SFT FTEs by Contract

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Year

FT

Es

FELL

PDAS

PJAS

STAF

Grand Total

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Manpower Summary

We are at level that is just sufficient for what we need to do (as it was planned in 2005)

Besides the R&D activities all the others are long-term activities People ending their contract or retiring should be replaced Applied Fellow level should be maintained

In principle any new commitment should be accompanied with the corresponding additional resources

September, 30th 2009 35

SFT S o F T w a r e   D e v e l o p m e n t   f o r   E x p e r i m e n t s

Concluding Remarks The “raison d'être” of the group as well as its mandate is still valid

It is a very effective and efficient way of providing key scientific software components to the CERN experimental program

Instrumental to the leadership of CERN in HEP software The main focus continues to be the LHC for many years but we

should slightly modified the scope to include new activities We should continue in the direction of increasing the synergy and

coherency between the different projects hosted in the group The group is ready to face the challenges imposed by the physics

analysis of the LHC experiments We should start planning on how and what to incorporate from the

R&D activities into the baseline The staffing level that is just sufficient for what we need to do

September, 30th 2009 36


Recommended