Date post: | 20-Jan-2016 |
Category: |
Documents |
Upload: | ada-carson |
View: | 216 times |
Download: | 1 times |
Summary of Category 3Summary of Category 3HENP Computing Systems and HENP Computing Systems and
InfrastructureInfrastructure
Ian Fisk and Michael ErnstIan Fisk and Michael Ernst
CHEP 2003CHEP 2003
March 28, 2003March 28, 2003
CAS 2002-10-24 Ian M. Fisk, UCSD2
IntroductionIntroduction
We tried to break the week into themesWe tried to break the week into themes We Discussed Fabrics and Architectures on Monday
Heard general talks about building and securing large multi-purpose facilities As well as updates from a number of HEPN computing efforts
We discussed emerging hardware and software technology on Tuesday Review of the most recent pasta report and update of commodity disk
storage work Software for flexible clusters: MOSIX. Advanced storage and data serving:
CASTOR, ENSTOR, dCache, Data Farm and ROOT-IO We discussed Grid and other services on Thursday
Grid Interfaces and Storage Management over the grid Monitoring services
It was a full week with a lot to discuss. Special thanks to all those who It was a full week with a lot to discuss. Special thanks to all those who presented.presented.
There is no way to cover very much of what was presented in a thirty minute talk.
CAS 2002-10-24 Ian M. Fisk, UCSD3
General ObservationsGeneral Observations
Grid functionality is coming quicklyGrid functionality is coming quickly Basic underlying concepts of distributed, parasitic, and multi-purpose
computing are already being deployed in running experiments Early implementation of interfaces for grid services to fabrics I would expect by the time the LHC experiments have real data that the tools
and techniques will have been well broken-in by experiments running today
Shift to commodity equipment accelerated since the last CHEPShift to commodity equipment accelerated since the last CHEP I would argue that the shift is nearly complete
At least two large computing centers admitted to having nothing in their work rooms but Linux systems and a few Suns to debug software
This has resulted in the development of tools to help handle this complicated component environment
With notable exceptions high energy computing does not work well With notable exceptions high energy computing does not work well togethertogether
The individual experiments often have subtly different requirements, which results in completely independent development efforts
CAS 2002-10-24 Ian M. Fisk, UCSD4
Distributed ComputingDistributed ComputingExample from CDF: Central Analysis Facility is very well usedExample from CDF: Central Analysis Facility is very well used
Future (very near future) plan isFuture (very near future) plan is
to deploy satellite analysis farmsto deploy satellite analysis farms
to increase the computing to increase the computing
resources.resources.
CAS 2002-10-24 Ian M. Fisk, UCSD5
Distributed ComputingDistributed Computing
Peter Elmer presented how the Babar experiment has been able to take Peter Elmer presented how the Babar experiment has been able to take advantage of distributed computing resources for primary event advantage of distributed computing resources for primary event reconstructionreconstruction
By splitting their prompt
calibration and event
reconstruction, they now
take advantage of 5
reconstruction farms at
SLAC and 4 in Padova
CAS 2002-10-24 Ian M. Fisk, UCSD6
Parasitic ComputingParasitic Computing
Bill Lee presented the CLuED0 work of the D0 experimentBill Lee presented the CLuED0 work of the D0 experiment CLuED0 is a cluster of D0 desktop machines which along with some custom
management software provides D0 with 50% of their analysis CPU cycles parasitically.
Heterogeneous system with distributed support
The US LHC experiments submitted a proposal on Monday which, among The US LHC experiments submitted a proposal on Monday which, among many other topics, discussed the use of economic theories to optimize many other topics, discussed the use of economic theories to optimize resource allocations.resource allocations.
Techniques already used in D0
CAS 2002-10-24 Ian M. Fisk, UCSD7
Multipurpose ComputingMultipurpose Computing
Fundamental to a grid connected facility is the ability to support multiple Fundamental to a grid connected facility is the ability to support multiple experiments at a minimum and ideally multiple disciplines experiments at a minimum and ideally multiple disciplines
The people responsible for computing systems have been thinking about how to make this possible, because so many regional computing centers have to support multiple experiments and user communities.
John Gordon gave an interesting talk on whether it was possible to build a multipurpose center John identified 6 categories of problems and discussed possible solutions
Software levels ‘experts’ Local rules Security Firewalls The accelerator centres
CAS 2002-10-24 Ian M. Fisk, UCSD8
Early Interfacing of Grid Services to Early Interfacing of Grid Services to FabricsFabrics
Alex Sim gave a talk on the Storage Resource Manager: SRM FunctionalityAlex Sim gave a talk on the Storage Resource Manager: SRM Functionality Manage space
Negotiate and assign space to users, Manage “lifetime” of spaces Manage files on behalf of a user
Pin files in storage till they are released, Manage “lifetime” of files Manage action when pins expire (depends on file types)
Manage file sharing Policies on what should reside on a storage resource at any one time Policies on what to evict when space is needed
Get files from remote locations when necessary Purpose: to simplify client’s task
Manage multi-file requests A brokering function: queue file requests, pre-stage when possible
Provide grid access to/from mass storage systems HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE (Jlab), Castor (CERN),
MSS (NCAR), …
CAS 2002-10-24 Ian M. Fisk, UCSD9
Early ImplementationEarly Implementation
The functionality of SRM is impressive, leads to interesting analysis scenariosThe functionality of SRM is impressive, leads to interesting analysis scenarios
Equally interesting is the number of places that are prepared to interface their Equally interesting is the number of places that are prepared to interface their storage to the WAN using SRMstorage to the WAN using SRM
Robust file replication between BNL and LBNL
CAS 2002-10-24 Ian M. Fisk, UCSD10
Shift to commodity equipmentShift to commodity equipment
CAS 2002-10-24 Ian M. Fisk, UCSD11
Benefits and ComplicationsBenefits and Complications
The benefit is very substantial computing resources at a reasonable The benefit is very substantial computing resources at a reasonable hardware cost.hardware cost.
The complication is the scale and complexity of the commodity computing The complication is the scale and complexity of the commodity computing clustercluster
A reasonably big computing cluster today might be 1000 systems With all the possible hardware problems associated with 1000 systems
bought from the lowest bidder Considerable amount of deployment, integration, and development effort to
create tools that allow a shelf or rack of linux boxes to behave like a computing resource. Configuration Tools Monitoring Tools Tools for systems control Scheduling Tools Security Techniques
CAS 2002-10-24 Ian M. Fisk, UCSD12
Configuration ToolsConfiguration Tools
We heard an interesting talk from Thorsten Kleinwork on install and We heard an interesting talk from Thorsten Kleinwork on install and running systems at CERNrunning systems at CERN
Systems are installed with kickstart and RPMs
CERN and several other centers are deploying the configuration tools from CERN and several other centers are deploying the configuration tools from EDG WP4EDG WP4
Pan & CDB (Configuration Data Base) for describing hosts: Pan is a very flexible language for describing host configuration information:
Expressed in templates (ASCII) Allows includes (inheritance)
Pan is compiled into XML, inside CDB XML is downloaded and the information provided by CCConfig, which is the
high level API
Complicated even to track what it is you have.Complicated even to track what it is you have. We had an interesting presentation from Jens Kreutzkamp from DESY about
how they track their IT assets.
CAS 2002-10-24 Ian M. Fisk, UCSD13
Monitoring ToolsMonitoring Tools
Systems are complicated consisting of many components this has lead to Systems are complicated consisting of many components this has lead to the development of lots of monitoring toolsthe development of lots of monitoring tools
Very functional, complete and scalable though complicated to extend tools like NGOP, which Tanya Levshina presented
SystemStatusPage
SystemStatusPage
CAS 2002-10-24 Ian M. Fisk, UCSD14
Monitoring Tools (cont.)Monitoring Tools (cont.)
On the opposite end where examples of extremely lightweight monitoring On the opposite end where examples of extremely lightweight monitoring packages for Babar presented by Matthias Wittgen. packages for Babar presented by Matthias Wittgen.
Monitors CPU and network usage as well packets sent to disk and number of processes
Writes it to a central server where it is kept on a flat file.
CAS 2002-10-24 Ian M. Fisk, UCSD15
Tools for system controlTools for system control
Andras Horvath presented a technique for secure system control and reset Andras Horvath presented a technique for secure system control and reset access for a reasonable costaccess for a reasonable cost
This solutions doesn’t scale to 6000 boxes
System Andras is implementing consists of serial connections for console access and relays attached to the reset switch on the motherboard for resets
CAS 2002-10-24 Ian M. Fisk, UCSD16
Security TechniquesSecurity Techniques
Number of systems in these large commodity clusters makes for Number of systems in these large commodity clusters makes for interesting security workinteresting security work
Doubly so when worrying about making grid interfaces
The work to secure the BNL facility was presentedThe work to secure the BNL facility was presented Work prioritizing their assets and forming responses for security breaches
CAS 2002-10-24 Ian M. Fisk, UCSD17
Field doesn’t cooperate wellField doesn’t cooperate well
This is not necessarily a problem, nor is it a criticism, simply an This is not necessarily a problem, nor is it a criticism, simply an observationobservation
One doesn’t see a lot of common detector building projects, maybe it isn’t surprising that there aren’t a lot common computing development efforts
I noticed during the week that there is a lot of duplication of effort, even between experiments that are geographically close
We have forums for exchange like HEPIX and the Large Cluster Workshop meetings Even with these, we don’t seem to do much development in common
There are notable exceptionsThere are notable exceptions Alan Silverman presented the work to write a guide to building and operating a
large cluster Their noble if somewhat ambitious goal is to
“Produce the definitive guide to building and running a cluster - how to choose, acquire, test and operate the hardware; software installation and upgrade tools; performance mgmt, logging, accounting, alarms, security, etc, etc”
CAS 2002-10-24 Ian M. Fisk, UCSD18
Grid ProjectsGrid Projects
The grid projects are another area in which the field is working effectively The grid projects are another area in which the field is working effectively togethertogether
A number of sites indicated the desire to use common tools developed by EDG Work Package 4
Good buy in from fabric managers about the use of SRM
Software deployment through the VDT
CAS 2002-10-24 Ian M. Fisk, UCSD19
ConclusionsConclusions
It was a long and interesting weekIt was a long and interesting week
Apologies for not being able to summarize everythingApologies for not being able to summarize everything We had very interesting discussions and presentations yesterday about how to
interface the fabrics and the grid services I also didn’t get a change to cover some of the hardware and software R&D
results
I encourage people to look at the web page. Almost all the talks were I encourage people to look at the web page. Almost all the talks were posted.posted.