Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | rebekkah-clay |
View: | 15 times |
Download: | 0 times |
Technical Evolution in LHC Computing(with an ATLAS slant)
Torre Wenaus
Brookhaven National Laboratory
July 18, 2012
GRID’2012 Conference
JINR, Dubna
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 2
Fabiola Gianotti,ATLAS SpokespersonHiggs search seminarJuly 4 2012, CERN
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 3
This Talk
LHC computing – an outstanding success! Experiments, Grids, Facilities, Middleware providers, CERN IT
collaborated closely and delivered Where do we go from here? Drawing on what we’ve learned,
and the technical evolution around us, what tools and technologies will best enable continued success?
A hot topic as we approach a long shutdown 2013-14 Spring 2011: ATLAS established R&D programs across key
technical areas Fall 2011: WLCG established technical evolution groups
(TEGs) across similar areas This talk draws on both in surveying LHC computing technical
evolution activities and plans Subjectively, not comprehensively
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 4
Computing Model Evolution in ATLAS
Today:Flatter, more fluid, mesh-likeSites contribute according to capabilityGreater flexibility and efficiencyMore fully utilize available resources
Originally:Static, strict hierarchyMulti-hop data flowsLesser demands on Tier 2 networkingVirtue of simplicity
Principal enabler for the evolution:the network
Excellent bandwidth, robustness,reliability, affordability
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 5
Requirements 2013+In Light of Experience and Practical Constraints
Scaling up: processing and storage growth with higher luminosity, energy, trigger rate, analysis complexity, … With ever increasing premium on efficiency and full
utilization as resources get tighter and tighter Greater resiliency against site problems, especially in
storage/data access, the most common failure mode Effectively exploit new computer architectures, dynamic
networking technologies, emergent technologies for large scale processing (NoSQL, clouds, whatever’s next…)
Maximal commonality: common tools, avoid duplicative efforts, consolidate where possible. Efficiency of cost and manpower
All developments must respect continuity of operations; life goes on even in shutdowns
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 6
WLCG Technical Evolution Groups (TEGs)
TEGs established Fall 2011 in the areas of databases, data management and storage, operations, security, and workload management
Mandate: survey present practice, lessons learned, future needs, technology evolution landscape, and recommend a 2-5 year technical evolution strategy Consistent with continuous smooth operations Premium on commonality
Reports last winter, follow-on activities recently defined Here follows a non-comprehensive survey across the technical
areas of Principal TEG findings (relevant to technology evolution)
and other technical developments/trends Some ATLAS examples of technical evolution in action
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 7
Databases
Great success with CORAL generic C++ interface to back end DBs (Oracle, Frontier, MySQL, …). Will ensure support, together with the overlaid COOL conditions DB
Conditions databases evolved early away from direct reliance on RDBMS. Highly scalable thanks to Frontier web service interfacing Oracle to distributed clients served by proxy caches via a RESTful http protocol
Establishing Frontier/Squid as a full WLCG service Distributed apps (NoSQL) Oracle as the big iron DB will not change; use ancillary services like
Frontier, NoSQL to offload and protect it NoSQL: R&D and prototyping has led to convergence on Hadoop as
a NoSQL ‘standard’ at least for the moment CERN supporting a small instance now in pre-production ATLAS is consolidating NoSQL apps around Hadoop
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 8
Hadoop in ATLAS
Distributed data processing framework: filesystem (HDFS), processing framework (MapReduce), distributed NoSQL DB (Hbase), SQL front end (Hive), parallel data-flow language (Pig), …
Hadoop in ATLAS data management system: Aggregation and mining of log files – HDFS (3.5TB) and MapReduce
(70min processing at 70 MB/s I/O) Usage trace mining – HBase (300 insertions/sec, 80 GB/mo,
MapReduced in 2 min) Migrated from Cassandra in two days
File sharing service – copied to HDFS and served via Apache Wildcard file metadata search – HDFS and MapReduce Accounting – Pig data pipeline MapReduces 7GB to 500MB of output
summaries for Apache publication in 8min Stable, reliable, fast, easy to work with, robust against hardware failure Complements Oracle by reducing load and expanding mining capability Future potential: storage system, parallel access to distributed data
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 9
Data Management and Storage Evolution
Federated storage based on xrootd for easy to use, homogeneous, single-namespace data access NFS 4.1 an option in principle but not near ready
Support for HTTP storage endpoints and data access protocol Wide area direct data access to simplify data management
and support simpler, more efficient workflow Caching data dynamically and intelligently for efficient storage
usage, minimal latencies, simplified storage managment Point to point protocols needed: GridFTP, xrootd, HTTP, (S3?)
Without SRM overheads Adapt FTS managed transfer service to new requirements:
endpoint based configuration, xrootd and HTTP support (FTS3) On their way out: LFC file catalog, SRM storage interface
(except for archive)
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 10
Data Placement with Federation
Diagram from the storage & data management TEGs
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 11
Why xrootd as Basis for Storage Federation?
Xrootd is mature, proven for over a decade, tuned to HEP requirements
Stable, responsive support Commonality (Alice, ATLAS, CMS) Very efficient at file discovery Transparent use of redundant replicas to
overcome file unavailability Can be used to repair placed data
Works seamlessly with ROOT data formats used for event data
Close ROOT/xrootd collaboration Supports efficient WAN data access Interfaces to hierarchical caches, stores Uniform interface to back end storage
dCache, DPM, GPFS, HDFS, xrootd… Good monitoring tools
ATLAS xrootdfederation currentlyNow expanding to EU
Alice: xrootd in production for yearsCMS: federation in pre-productionATLAS: federation emerging from R&D
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 12
WAN Data Access and Caching
I/O optimization work in ROOT and experiments in recent years makes WAN data access efficient and viable
Caching at source and destination further improve efficiency Source: migrate frequently used data to a fast (e.g. SSD) cache Destination: addressable shared cache reduces latency for jobs
sharing/reusing data and reduces load on source. Requires cache awareness in brokerage to drive re-use
Asynchronous pre-fetch at the client further improves; WAN latencies don’t impede processing. Now supported by ROOT
Major simplification to data and workflow management; no more choreography in moving data to processing or vice versa
Demands ongoing I/O performance analysis and optimization, knowledge of access patterns (e.g. fraction of file read), I/O monitoring
Collaborative work between ROOT team and experiments Will benefit from integrating network monitoring into workflow brokerage
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 13
Operations & Tools
Strong desire for “Computing as a Service” (CaaS) especially at smaller sites
Easy to install, easy to use, standards-compliant packaged services managed and sustained with very low manpower
Graceful reactions to fluctuations in load and user behavior Straightforward and rapid problem diagnosis and remedy “Cloud is the ultimate CaaS”
Being acted on in ATLAS: cloud-based Tier 3s currently being prototyped
More commonality among tools Availability monitoring, site monitoring, network monitoring
Carefully evaluate how much middleware has a long term future Adopt CVMFS for use as a shared software area at all WLCG sites
In production for ATLAS and LHCb, in deployment for CMS
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 14
ATLAS deployment
Aim is 100% by end 2012
CERNVM File System (CVMFS)
Caching HTTP file system optimized for software delivery
Efficient: compression over the wire, duplicate detection
Scalable: works hand in hand with proxy caches
Originally developed as lightweight distributed file system for CERNVM virtual machines
Keep the VM footprint small and provision software transparently via HTTP and FUSE, caching exactly what you need
Adopted well beyond CERNVM community and very successful for general software (and other file data) distribution
ATLAS, CMS, LHCb, Geant4, LHC@Home 2.0, … Currently 75M objects, 5TB in CERN repositories
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 15
Security
No longer any security through obscurity And lack of obscurity makes security incidents all the
worse (press and publicity) Need fine-grained traceability to quickly contain and
investigate incidents, prevent recurrence Bring logging up to standard across all services Address in the future through private clouds? VM is
isolated and is itself fully traceable Incorporate identity federations (next slide)
Centralized identity management also enables easy banning
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 16
Identity Management
Traditional accessto grid services
Federated accessto grid services
R. Wartel
Identity management today:Complex for users, admins, VOs, developersID management is fragmented and generally non-localGrid identity not easily used in other contexts
Federated ID management:Common trust/policy frameworkSingle sign-onID managed in one placeID provider manages attributesService provider consumes attributesSupport services besides grid: collab tools, wikis, mail lists, …
WLCG Pilot project getting started
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 17
Workload Management
Pilots, pilots everywhere! Approach spread from ALICE and LHCb to all experiments
Experiment-level workload management systems successful No long term need for WMS as middleware component
But is WMS commonality still possible then? Yes… DIRAC (LHCb) used by other experiments, as is AliEn (ALICE) PanDA (ATLAS) has new (DOE) support to generalize as an
exascale computing community WMS ATLAS, CMS, CERN IT collaborating on common analysis
framework drawing on PanDA, glideinWMS (pilot layer) Pilot submission and management works well on foundation of Condor,
glideinWMS (which is itself a layer over Condor) Can the CE be simplified? Under active discussion
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 18
ATLAS/CMS Common Analysis Framework Proposed Architecture
Workflow engine
Status: CMS able to submit PanDA jobs to the ATLAS PanDA server
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 19
Workload Management 2
Multi-core and whole node support Support is pretty much there for simple usage modes – and
simplicity is the aim – in particular whole node scheduling Extended environmental info (eg. HS06, job lifetime)
Environment variables providing homogeneous information access across batch system flavors (could have done with this years ago)
Virtualization and cloud computing Virtual CE: better support for “any” batch system. Essential. Virtualization clearly a strong interest for sites, both for service
hosting and WNs Cloud computing interest/activity levels vary among experiments
ATLAS is well advanced in integrating and testing cloud computing in real analysis and production workflows, thanks to robust R&D program since spring 2010
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 20
Cloud Computing in ATLAS
Several use cases prototyped or in production PanDA based data processing and workload
management Centrally managed queues in the cloud
Elastically expand resources transparently to users
Institute managed Tier 3 analysis clusters Hosted locally or (more efficiently) at
shared facility, T1 or T2 Personal analysis queues
User managed, low complexity (almost transparent), transient
Data storage Transient caching to accelerate cloud
processing Object storage and archiving in the cloud
PanDA production in the cloud
Concurrent jobs atCanadian sites
Personal PanDA Amazonanalysis site prototype
Personalpilot
factory
EC2 analysis cloud
Amazonanalysis
cloud status
ActiveMQMonitoring
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 21
Networking
Network has been key to the evolution to date, and will be key also in the future Dynamic intelligent networks, comprehensive monitoring,
experiment workflow and dataflow systems integrated with network monitoring and controls
Network-aware brokerage Recent WLCG agreement to establish a working group on
better integrating networking as an integral component of LHC computing
LHCONE: progressively coming online to provide entry points for T1, T2, T3 sites into a dedicated LHC network fabric Complementing LHCOPN as the network that serves T0-T1
traffic
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 22
LHCONE and Dynamic Networks
LHCONE initiative established 2010 to (first and foremost) better support and integrate T2 networking given the strong role of T2s in the evolving LHC computing modelsManaged, segregated traffic for guaranteed performanceTraffic engineering, flow management to fully utilize resourcesLeverage advanced networking
LHCONE services being constructed: Multipoint, virtual network (logical traffic separation) Static & dynamic point to point circuits for guaranteed bandwidth, high-throughput
data movement Comprehensive end-to-end monitoring and diagnostics across all sites, paths
Software Defined Networking (OpenFlow) is probable technology for LHCONE in long term; under investigation as R&D. Converge on 2 year timescale (end of shutdown)
In the near term, establish a point-to-point dynamic circuits pilot
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 23
Conclusions
Focus for the future:Leverage the networkMonitoring, diagnosticsRobustness, reliabilitySimplicity, ease of useSupport, sustainabilityPilots, CondorFederationOracle + NoSQLCaching, WAN accessHTTP, RESTful APIsVirtualizationCommonalityOpen standardsLeverage beyond HEP
Left out of the focus areas: scalability Of course it is a principal one, but the success of
LHC computing to date exemplifies that we have a handle on scalability
Thanks to success to date, the future will be evolutionary, not revolutionary
For the future, focus on Technical tools and approaches most key to
having achieved scalability and manageability New tools and technologies with the greatest
promise to do the same in the future So ensure sustainability in the face of ever more
challenging requirements and shrinking manpower On the foundation of powerful networking that we are
still learning to exploit to its fullest And on the foundation of the Grid, which has served
us well and is evolving with us
Farewells to:LFC, WMS, SRM (mostly)
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 24
Thanks
This talk drew on presentations, discussions, comments, input from many
Thanks to all, including those I’ve missed Artur Barczyk, Doug Benjamin, Ian Bird, Jakob Blomer,
Simone Campana, Dirk Duellmann, Andrej Filipcic, Rob Gardner, Alexei Klimentov, Mario Lassnig, Fernando Megino, Dan Van Der Ster, Romain Wartel, …
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 25
Related Talks at GRID’12
I. Bird (Monday) A. Klimentov, Distributed computing challenges, HEP computing
beyond the grid (Tuesday) A. Vaniachine, Advancements in Big Data Processing (Tuesday) O. Smirnova, Implementation of common technologies in grid
middleware (Tuesday) A. Tsaregorodtsev, DIRAC middleware for distributed computing
systems (Tuesday) K. De, Status and evolution of ATLAS workload management system
PanDA (Tuesday) ATLAS computing workshop (Tuesday) D. Duellmann, Storage strategy and cloud storage evaluations at
CERN (Wednesday)
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 26
More Information
WLCG Technical Evolution Groups https://twiki.cern.ch/twiki/bin/view/LCG/WebHome Frontier talks at CHEP 2012
Comparison to NoSQL, D. Dykstra http://goo.gl/n0fcp CMS experience, D. Dykstra http://goo.gl/dYovr ATLAS experience, A. Dewhurst http://goo.gl/OIXsH
NoSQL & Hadoop in ATLAS DDM, M. Lassnig, CHEP 2012 http://goo.gl/0ab4I Xrootd http://xrootd.slac.stanford.edu/ Next generation WLCG File Transfer Service (FTS), Z. Molnar, CHEP 2012 http://goo.gl/mZWtp Evolution of ATLAS PanDA System, T. Maeno, CHEP 2012 http://goo.gl/lgDfU AliEn http://alien2.cern.ch/ DIRAC http://diracgrid.org/ CVMFS http://cernvm.cern.ch/portal/filesystem Status and Future Perspectives of CVMFS, J. Blomer, CHEP 2012 http://goo.gl/82csr LHCONE http://lhcone.web.cern.ch/
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 27
Supplementary
Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 28
Frontier/Squid for Scalable Distributed DB Access
LHC experiments keep time-dependent data on detector conditions, alignment etc in Oracle at CERN
ATLAS alone runs 120,000+ jobs concurrently around the world. Many of these need conditions data access
How to provide scalable access to centrally Oracle-resident data? The answer, first developed by CDF/CMS at FNAL, and adopted for LHC: Frontier Frontier is a web service that translates DB queries into HTTP
Far fewer round trips between client and DB than using Oracle directly; fast over WAN Because it is HTTP based, caching web proxies (typically squid) can provide hierarchical,
highly scalable cache based data access Reuse of data among jobs is high, so caching is effective at off-loading Oracle
CMS Frontier deployment