Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | rosemary-brooks |
View: | 220 times |
Download: | 2 times |
SFT Group Review:Additional projects,
future directions and overall planning SPI project (WP8) Multi-core (WP9) Virtualization Other projects Vision Planning
September, 30th 2009 1
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Software Process and Infrastructure External libraries service
LHC experiments use about 100 libraries (open-source and public domain libraries) – see http://lcgsoft.cern.ch/
Automated building/distribution of all packages from sources for all the AA supported platforms – fast
Recently introduced new compilers and OSs (slc5, gcc 4.3, VS9, icc11) Release management of AA software stack for LHC experiments
Coordination done via the "Librarians and Integrators meeting" (low level) and "Architects Forum" (high level)
Last year we have been deploying two major release series (LCG 55 and 56) with 3 bug-fix releases on top for each series (a-c)
Moving to new compilers (e.g. gcc 4.3 for slc5) Optionally releasing parts of ROOT separately Release infrastructure also used by outside LHC experiments
(DayaBay, Memphis, Dusel)
September, 30th 2009 3
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
External softwareExternal software
Python
BoostQt
Xerces
GSLvalgrind
Grid… 100
packages
AA projectsAA projects ROOT
POOL COOL CORAL
RELAX
Xgcc 4.0icc 11
gcc 3.4gcc 4.3llvm 2.4
vc 7.1
vc 9
32 bit
64 bit
Com
mon
sof
twar
e
Java
LHC Experiment Software
LHC Experiment Software
AliRoot
CMSSW
LHCb /Gaudi
Atlas /Athena
Mac OSX (10.5)
Linux (slc4, slc5)
Windows (XP)The LHC Software Stack
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Software Process and Infrastructure (2) Nightly build, testing and integration service
Nightly builds are building/testing all software on all AA provided platforms (Scientific Linux, Windows, MacOS)
Currently extending to new compiler suites (icc) for improving software robustness and moving forward to Mac OSX 10.6
Nightlies are used in "chains" with LHC experiments building on top Fast feedback loop about changes in AA software Currently integrating nightly builds with CernVM - almost finished
Collaborative tools (HyperNews, Savannah) Savannah is highly used in LHC and outside (CERN/IT, Grid, etc.) HyperNews service has been migrated to e-groups/Sharepoint for all
LHC experiments but CMS New effort for AA wide web infrastructure based on Drupal (uniform
look and feel, better integration with each other) Infrastructure in general for the rest of the group
September, 30th 2009 5
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Savannah Usage
September, 30th 2009 6
0
200
400
600
800
1000
1200
1400
1600
Oct-0
7
Dec-0
7
Feb-
08
Apr-0
8
Jun-
08
Aug-0
8
Oct-0
8
Dec-0
8
Feb-
09
Apr-0
9
Jun-
09
Aug-0
9
New Bugs
New Tasks
Postings per month
0500
100015002000250030003500400045005000
Oct-0
7
Dec-0
7
Feb-
08
Apr-0
8
Jun-
08
Aug-0
8
Oct-0
8
Dec-0
8
Feb-
09
Apr-0
9
Jun-
09
Aug-0
9
Registered users
02000400060008000
100001200014000160001800020000
Oct
-07
Dec-0
7
Feb-0
8
Apr-0
8
Jun-
08
Aug-0
8
Oct
-08
Dec-0
8
Feb-0
9
Apr-0
9
Jun-
09
Aug-0
9
ALICE
ATLAS
CMS
LHCB
CERNIT
EGEE
LCGAA
LHCGRID
OTHER
Bugs per experiment/project types
0
50
100
150
200
250
300
350
Registered projects
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
MULTI-CORE R&D (WP8)
September, 30th 2009 7
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Main Goals
Investigate software solutions to efficiently exploit the new multi-core architectures of modern computers that experiments will need to solve Memory, which will get worse as we go to higher luminosities CPU efficiency to keep up with computational needs
Ongoing investigations covering four areas: Parallelization at event level using multiple processes Parallelization at event level using multiple threads High granularity parallelization of algorithms Optimization of memory access to adapt to new memory hierarchy
Collaboration established with LHC experiments, Geant4, ROOT and OpenLab (CERN/IT)
September, 30th 2009 8
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Current Activities Close interaction with experiments (bi-weekly meetings, reports in AF) Workshops each six months (latest in June with IT on “deployment”) Training and working sessions in collaboration with OpenLab and Intel ATLAS has developed a prototype using fork & COW CMS is investigating the same route and the use of OpenMP in
algorithms Gaudi team is investigating parallelization at Python level Geant4 has developed a multi-thread prototype ROOT is developing PROOF-light and use of multi-threads in the I/O A parallel version of Minuit (using OpenMP and MPI) already released in
ROOT 5.24 A library to use shared memory has been developed in the project
September, 30th 2009 9
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Parallelization of Gaudi Framework
September, 30th 2009 10
•No change needed in user code or configuration•Equivalent output (un-ordered events)
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Exploit Copy on Write (COW)
Modern OS share read-only pages among processes dynamically A memory page is copied and made private to a process only
when modified
Prototype in Atlas and LHCb Encouraging results as memory sharing is concerned (50%
shared) Concerns about I/O
(need to merge outputfrom multiple processes)
11September, 30th 2009
Memory (ATLAS)One process: 700MB VMem and 420MB RSSCOW:(before) evt 0: private: 004 MB | shared: 310 MB(before) evt 1: private: 235 MB | shared: 265 MB. . .(before) evt50: private: 250 MB | shared: 263 MB
See Sebastien Binet’s talk @ CHEP09
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Exploit “Kernel Shared Memory” KSM is a linux driver that allows dynamically sharing identical
memory pages between one or more processes. It has been developed as a backend of KVM to help memory sharing
between virtual machines running on the same host. KSM scans just memory that was registered with it. Essentially this
means that each memory allocation, sensible to be shared, need to be followed by a call to a registry function.
CMS reconstruction of real data (Cosmics with full detector) No code change 400MB private data; 250MB shared data; 130MB shared code
ATLAS No code change In a Reconstruction job of 1.6GB VM, up to 1GB can be shared with
KSM
12September, 30th 2009
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Parallel MINUIT Minimization of Maximum Likelihood or χ2 requires iterative
computation of the gradient of the NLL function
Execution time scales with number θ free parameters and the number N of input events in the fit
Two strategies for the parallelization of the gradient and NLL calculation: Gradient or NLL calculation on
the same multi-cores node (OpenMP) Distribute Gradient on different
nodes (MPI) and parallelize NLL calculation on each multi-cores node (pthreads): hybrid solution
13September, 30th 2009
Alfio Lazzaro and Lorenzo Moneta
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Multi-Core R&D: Outlook Recent progress shows that we shall be able to exploit next
generation multi-core with “small” changes to HEP code Exploit copy-on-write (COW) in multi-processing (MP) Develop an affordable solution for the sharing of the output file Leverage Geant4 experience to explore multi-thread (MT) solutions
Continue optimization of memory hierarchy usage Study data and code “locality” including “core-affinity”
Expand Minuit experience to other areas of “final” data analysis, such as machine learning techniques Investigating the possibility to use GPUs and custom FPGAs
“Learn” how to run MT/MP jobs on the grid workshop at CERN, June 25th-26th:
http://indico.cern.ch/conferenceDisplay.py?confId=56353 Collaboration established with CERN/IT and LCG
14September, 30th 2009
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Explore new Frontier of parallel computing Scaling to many-core processors (96-core processors foreseen for
next year) will require innovative solutions MP and MT beyond event level Fine grain parallelism (OpenCL, custom solutions?) Parallel I/O
WP8 is continuously transferring technologies and artifacts to the experiments: it will allow LHC experiments to best use current computing resources
Computing technology will continue its route toward more and more parallelism: a continuous investment in following the technology and provide solutions adequate to the most modern architectures is instrumental to best exploit the computing infrastructure needed for the future of CERN
15September, 30th 2009
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
VIRTUALIZATION R&D (WP9)
September, 30th 2009 16
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Problem Software @ LHC
Millions of lines of code Different packaging and software distribution models Complicated software installation/update/configuration procedure Long and slow validation and certification process Very difficult to roll out major OS upgrade (SLC4 -> SLC5) Additional constraints imposed by the grid middleware
development Effectively locked on one Linux flavour Whole process is focused on middleware and not on applications
How to effectively harvest multi and many core CPU power of user laptops/desktops if LHC applications cannot run in such environment?
September, 30th 2009 17
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Virtualization R&D Project
Aims to provide a complete, portable and easy to configure user environment for developing and running LHC data analysis locally and on the Grid Code check-out, edition, compilation, local small test, debugging,
… Grid submission, data access… Event displays, interactive data analysis No user installation of software required suspend/resume capability
Independent of physical software and hardware platform (Linux, Windows, MacOS)
September, 30th 2009 18
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Virtualizing LHC applications
Starting from experiment software…
…ending with a custom Linux specialised for a given task
September, 30th 2009 19
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Developing CernVM
Quick development cycles In close collaboration with experiments Very good feedback from enthusiastic
users Planning presented at kickoff Workshop
First release ahead of plan (one year ago)
J F M A M J J A S O N D
Release 1.2
- Release 1.3.3
2nd Workshop
- Release 1.3.4
CSC2009
Release 1.4.0
Release 1.6.0 (Final)
Release 2.0 (SL5)
2009
J F M A M J J A S O N D
Preparation Release 0.5
- Release 0.6
On time!
Kickoff Workshop
- Release 0.8
- Release 0.7
- Release 0.91 (RC1)
-Release 0.92 (RC2)
2008
Semi production operation in 2009 Stable and Development branches
Dissemination 2nd Workshop (organized with IT) ACAT, HEPIX, CHEP09 CERN School of Computing 2009 Tutorials and presentations to experiments
September, 30th 2009 20
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
~1200 different IP addresses
Where are CernVM users?
September, 30th 2009 21
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0.1 0.2 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2CernVM version
# of downloads
45%
17%
7%
28%
3%
ATLAS ALICE CMS LHCB LCD
Download & Usage statistics
September, 30th 2009 22
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Transition from R&D to Service
September, 30th 2009 23
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
CernVM Infrastructure
We have developed, deployed and operating highly available service infrastructure to support 4 LHC Experiments and LCD
If CernVM is to be transformed to proper 24x7 service, this will have to be moved to IT premises
An opportunity to transfer know how and perhaps start collaborating on these issues
September, 30th 2009 24
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Continuing R&D Working on scalability and performance
improvements of CVMFS P2P on LAN, CDN on WAN
SLC5 compatibility Will be addressed in CernVM 2.0
CernVM as job hosting environment Ideally, users would like to run their applications
on the grid (or cloud) infrastructure in exactly the same conditions in which they were originally developed
CernVM already provides development environment and can be deployed on cloud (EC2)
One image supports all four LHC experiments Easily extensible to other communities
September, 30th 2009 25
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Multi-core and Virtualization Workshop Last June we had a workshop on adapting applications
and computing services to multi-core and virtualization Organized in conjunction with IT Participation from vendors, experiments and CERN-IT and Grid
service providers
The goals were Familiarize with latest technologies and the current industry trends Understand experiment applications with the new requirements
introduced with virtualization and multi‐core Initial exploration of solutions
A number of follow up actions identified Discussed them at the IT Physics Services meeting Following them up
September, 30th 2009 26
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
FUTURE DIRECTIONS
September, 30th 2009 27
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Projects Summary The SFT activities can be summarized as
Development, testing and validating common software packages Software services for the experiments User support and consultancy Provide people for certain roles in experiments Technology tracking and development
Listening and responding to requests from experiments AF and other communications channels (e.g. LIM and AA meetings,
Collaboration meetings, etc.) Direct participation to the experiments
Follow technology trends and test new ideas Anticipate future needs from experiments Must keep up-to-date with this rapid changing field
September, 30th 2009 28
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Main Challenges
Coping with the reduction of manpower We are [have been] forced to do more with less people
Increase convergence between the projects To be more efficient and give more coherent view to our clients Some successes for far: nightlies, testing, savannah, web tools, etc. Many more things can be done but require temporary extra effort
Incorporate new experiments/projects Minimal critical mass needed for each new activity
Keep motivation during maintenance phase Spending more time on user support and maintenance than new
developments
September, 30th 2009 29
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
SFT evolution (not revolution) Support new experiments/projects (e.g. LCD, NA62)
Take into account possible new requirements to evolve some of the software packages
Provide people for certain roles ( e.g. Software Architects, Librarians, Coordinators, etc.) to new the projects to leverage from the expertise of the group
The [AA] LCG structure has served well until now can continue with minor modifications Incorporate the new activities under the same structure The monitoring and review process should also include the new activities
Regrouping the AA activities in a single group Organize, manage, distribute all the software packages as the “new
CERN Library” Hosting specialists on MC and Data Analysis domains
September, 30th 2009 30
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Regrouping the AA activities
Something we could follow-up is the idea of re-grouping the AA activities to the SFT group
Currently the development/maintenance of the Persistency Framework packages (POOL, CORAL, COOL) are hosted in IT-DM group (~3 FTE) The rational has been to be closer to the Physics DB services The level of integration of these developments to the rest of the
packages (e.g. ROOT) has suffered in general
Benefits The activity can leverage from the expertise in the group Better integration with the rest of AA projects
September, 30th 2009 31
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
The New CERN Library
Provide a coherent and fairly complete set of software packages ready to be used by any existing and new HEP experiment Utility libraries, analysis tools, MC generators, full and fast
simulation, reconstruction algorithms, etc. Reference place to look for some needed functionality Made from internally and externally developed products
[fairly] independent packages including some integrating elements such as ‘dictionaries’, ‘standard interfaces’, etc. Support for I/O, interactivity, scripting, etc.
Easy to configure, deploy and use by end-users Good packaging and deployment tools will be essential
September, 30th 2009 32
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Hosting Specialists
The SFT [CERN] does not have experts on many scientific or technical domains that can contribute to ‘content’ of the new CERN Library Probably out of question to hire people like Fred James
We can overcome this by inviting/offering short visits or PJAS/PDAS contracts E.g. Geant4 model development mainly carried out by PJAS E.g. Torbjorn Sjostrand hosted by SFT during development of
Pythia8
We just need good contacts and some additional exploitation budget
September, 30th 2009 33
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Manpower Plans In depth planning exercise
prepared in 2005 Still valid for baseline program
Does not take into account the R&D and new possible commitments
Ideal level of resources needed by project (baseline) Simulation: 5 STAF, 1 FELL, 1
TECH, 2 PJAS. 1 PDAS ROOT: 6 STAF, 1 FELL, 1 TECH,
1 PJAS SPI: 2 STAF, 1 FELL
September, 30th 2009 34
SFT FTEs by Project
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Year
FT
Es
LCG:SPI
LCG:CORE
LCG:SIMU
LCG:MGT
Grand Total
PH/SFT FTEs by Contract
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Year
FT
Es
FELL
PDAS
PJAS
STAF
Grand Total
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Manpower Summary
We are at level that is just sufficient for what we need to do (as it was planned in 2005)
Besides the R&D activities all the others are long-term activities People ending their contract or retiring should be replaced Applied Fellow level should be maintained
In principle any new commitment should be accompanied with the corresponding additional resources
September, 30th 2009 35
SFT S o F T w a r e D e v e l o p m e n t f o r E x p e r i m e n t s
Concluding Remarks The “raison d'être” of the group as well as its mandate is still valid
It is a very effective and efficient way of providing key scientific software components to the CERN experimental program
Instrumental to the leadership of CERN in HEP software The main focus continues to be the LHC for many years but we
should slightly modified the scope to include new activities We should continue in the direction of increasing the synergy and
coherency between the different projects hosted in the group The group is ready to face the challenges imposed by the physics
analysis of the LHC experiments We should start planning on how and what to incorporate from the
R&D activities into the baseline The staffing level that is just sufficient for what we need to do
September, 30th 2009 36