Technical Evolution in LHC Computing (with an ATLAS slant)

Technical Evolution in LHC Computing(with an ATLAS slant)

Torre Wenaus

Brookhaven National Laboratory

July 18, 2012

GRID’2012 Conference

JINR, Dubna

Grid’2012 Conference, JINR, Dubna July 18 2012Torre Wenaus, BNL 2

Fabiola Gianotti,ATLAS SpokespersonHiggs search seminarJuly 4 2012, CERN


This Talk

LHC computing – an outstanding success! Experiments, Grids, Facilities, Middleware providers, CERN IT

collaborated closely and delivered Where do we go from here? Drawing on what we’ve learned,

and the technical evolution around us, what tools and technologies will best enable continued success?

A hot topic as we approach a long shutdown 2013-14 Spring 2011: ATLAS established R&D programs across key

technical areas Fall 2011: WLCG established technical evolution groups

(TEGs) across similar areas This talk draws on both in surveying LHC computing technical

evolution activities and plans Subjectively, not comprehensively


Computing Model Evolution in ATLAS

Today:Flatter, more fluid, mesh-likeSites contribute according to capabilityGreater flexibility and efficiencyMore fully utilize available resources

Originally:Static, strict hierarchyMulti-hop data flowsLesser demands on Tier 2 networkingVirtue of simplicity

Principal enabler for the evolution:the network

Excellent bandwidth, robustness,reliability, affordability


Requirements 2013+In Light of Experience and Practical Constraints

Scaling up: processing and storage growth with higher luminosity, energy, trigger rate, analysis complexity, … With ever increasing premium on efficiency and full

utilization as resources get tighter and tighter Greater resiliency against site problems, especially in

storage/data access, the most common failure mode Effectively exploit new computer architectures, dynamic

networking technologies, emergent technologies for large scale processing (NoSQL, clouds, whatever’s next…)

Maximal commonality: common tools, avoid duplicative efforts, consolidate where possible. Efficiency of cost and manpower

All developments must respect continuity of operations; life goes on even in shutdowns


WLCG Technical Evolution Groups (TEGs)

TEGs established Fall 2011 in the areas of databases, data management and storage, operations, security, and workload management

Mandate: survey present practice, lessons learned, future needs, technology evolution landscape, and recommend a 2-5 year technical evolution strategy Consistent with continuous smooth operations Premium on commonality

Reports last winter, follow-on activities recently defined Here follows a non-comprehensive survey across the technical

areas of Principal TEG findings (relevant to technology evolution)

and other technical developments/trends Some ATLAS examples of technical evolution in action


Databases

Great success with CORAL generic C++ interface to back end DBs (Oracle, Frontier, MySQL, …). Will ensure support, together with the overlaid COOL conditions DB

Conditions databases evolved early away from direct reliance on RDBMS. Highly scalable thanks to Frontier web service interfacing Oracle to distributed clients served by proxy caches via a RESTful http protocol

Establishing Frontier/Squid as a full WLCG service Distributed apps (NoSQL) Oracle as the big iron DB will not change; use ancillary services like

Frontier, NoSQL to offload and protect it NoSQL: R&D and prototyping has led to convergence on Hadoop as

a NoSQL ‘standard’ at least for the moment CERN supporting a small instance now in pre-production ATLAS is consolidating NoSQL apps around Hadoop


Hadoop in ATLAS

Distributed data processing framework: filesystem (HDFS), processing framework (MapReduce), distributed NoSQL DB (Hbase), SQL front end (Hive), parallel data-flow language (Pig), …

Hadoop in ATLAS data management system: Aggregation and mining of log files – HDFS (3.5TB) and MapReduce

(70min processing at 70 MB/s I/O) Usage trace mining – HBase (300 insertions/sec, 80 GB/mo,

MapReduced in 2 min) Migrated from Cassandra in two days

File sharing service – copied to HDFS and served via Apache Wildcard file metadata search – HDFS and MapReduce Accounting – Pig data pipeline MapReduces 7GB to 500MB of output

summaries for Apache publication in 8min Stable, reliable, fast, easy to work with, robust against hardware failure Complements Oracle by reducing load and expanding mining capability Future potential: storage system, parallel access to distributed data


Data Management and Storage Evolution

Federated storage based on xrootd for easy to use, homogeneous, single-namespace data access NFS 4.1 an option in principle but not near ready

Support for HTTP storage endpoints and data access protocol Wide area direct data access to simplify data management

and support simpler, more efficient workflow Caching data dynamically and intelligently for efficient storage

usage, minimal latencies, simplified storage managment Point to point protocols needed: GridFTP, xrootd, HTTP, (S3?)

Without SRM overheads Adapt FTS managed transfer service to new requirements:

endpoint based configuration, xrootd and HTTP support (FTS3) On their way out: LFC file catalog, SRM storage interface

(except for archive)


Data Placement with Federation

Diagram from the storage & data management TEGs


Why xrootd as Basis for Storage Federation?

Xrootd is mature, proven for over a decade, tuned to HEP requirements

Stable, responsive support Commonality (Alice, ATLAS, CMS) Very efficient at file discovery Transparent use of redundant replicas to

overcome file unavailability Can be used to repair placed data

Works seamlessly with ROOT data formats used for event data

Close ROOT/xrootd collaboration Supports efficient WAN data access Interfaces to hierarchical caches, stores Uniform interface to back end storage

dCache, DPM, GPFS, HDFS, xrootd… Good monitoring tools

ATLAS xrootdfederation currentlyNow expanding to EU

Alice: xrootd in production for yearsCMS: federation in pre-productionATLAS: federation emerging from R&D


WAN Data Access and Caching

I/O optimization work in ROOT and experiments in recent years makes WAN data access efficient and viable

Caching at source and destination further improve efficiency Source: migrate frequently used data to a fast (e.g. SSD) cache Destination: addressable shared cache reduces latency for jobs

sharing/reusing data and reduces load on source. Requires cache awareness in brokerage to drive re-use

Asynchronous pre-fetch at the client further improves; WAN latencies don’t impede processing. Now supported by ROOT

Major simplification to data and workflow management; no more choreography in moving data to processing or vice versa

Demands ongoing I/O performance analysis and optimization, knowledge of access patterns (e.g. fraction of file read), I/O monitoring

Collaborative work between ROOT team and experiments Will benefit from integrating network monitoring into workflow brokerage


Operations & Tools

Strong desire for “Computing as a Service” (CaaS) especially at smaller sites

Easy to install, easy to use, standards-compliant packaged services managed and sustained with very low manpower

Graceful reactions to fluctuations in load and user behavior Straightforward and rapid problem diagnosis and remedy “Cloud is the ultimate CaaS”

Being acted on in ATLAS: cloud-based Tier 3s currently being prototyped

More commonality among tools Availability monitoring, site monitoring, network monitoring

Carefully evaluate how much middleware has a long term future Adopt CVMFS for use as a shared software area at all WLCG sites

In production for ATLAS and LHCb, in deployment for CMS


ATLAS deployment

Aim is 100% by end 2012

CERNVM File System (CVMFS)

Caching HTTP file system optimized for software delivery

Efficient: compression over the wire, duplicate detection

Scalable: works hand in hand with proxy caches

Originally developed as lightweight distributed file system for CERNVM virtual machines

Keep the VM footprint small and provision software transparently via HTTP and FUSE, caching exactly what you need

Adopted well beyond CERNVM community and very successful for general software (and other file data) distribution

ATLAS, CMS, LHCb, Geant4, LHC@Home 2.0, … Currently 75M objects, 5TB in CERN repositories


Security

No longer any security through obscurity And lack of obscurity makes security incidents all the

worse (press and publicity) Need fine-grained traceability to quickly contain and

investigate incidents, prevent recurrence Bring logging up to standard across all services Address in the future through private clouds? VM is

isolated and is itself fully traceable Incorporate identity federations (next slide)

Centralized identity management also enables easy banning


Identity Management

Traditional accessto grid services

Federated accessto grid services

R. Wartel

Identity management today:Complex for users, admins, VOs, developersID management is fragmented and generally non-localGrid identity not easily used in other contexts

Federated ID management:Common trust/policy frameworkSingle sign-onID managed in one placeID provider manages attributesService provider consumes attributesSupport services besides grid: collab tools, wikis, mail lists, …

WLCG Pilot project getting started


Workload Management

Pilots, pilots everywhere! Approach spread from ALICE and LHCb to all experiments

Experiment-level workload management systems successful No long term need for WMS as middleware component

But is WMS commonality still possible then? Yes… DIRAC (LHCb) used by other experiments, as is AliEn (ALICE) PanDA (ATLAS) has new (DOE) support to generalize as an

exascale computing community WMS ATLAS, CMS, CERN IT collaborating on common analysis

framework drawing on PanDA, glideinWMS (pilot layer) Pilot submission and management works well on foundation of Condor,

glideinWMS (which is itself a layer over Condor) Can the CE be simplified? Under active discussion


ATLAS/CMS Common Analysis Framework Proposed Architecture

Workflow engine

Status: CMS able to submit PanDA jobs to the ATLAS PanDA server


Workload Management 2

Multi-core and whole node support Support is pretty much there for simple usage modes – and

simplicity is the aim – in particular whole node scheduling Extended environmental info (eg. HS06, job lifetime)

Environment variables providing homogeneous information access across batch system flavors (could have done with this years ago)

Virtualization and cloud computing Virtual CE: better support for “any” batch system. Essential. Virtualization clearly a strong interest for sites, both for service

hosting and WNs Cloud computing interest/activity levels vary among experiments

ATLAS is well advanced in integrating and testing cloud computing in real analysis and production workflows, thanks to robust R&D program since spring 2010


Cloud Computing in ATLAS

Several use cases prototyped or in production PanDA based data processing and workload

management Centrally managed queues in the cloud

Elastically expand resources transparently to users

Institute managed Tier 3 analysis clusters Hosted locally or (more efficiently) at

shared facility, T1 or T2 Personal analysis queues

User managed, low complexity (almost transparent), transient

Data storage Transient caching to accelerate cloud

processing Object storage and archiving in the cloud

PanDA production in the cloud

Concurrent jobs atCanadian sites

Personal PanDA Amazonanalysis site prototype

Personalpilot

factory

EC2 analysis cloud

Amazonanalysis

cloud status

ActiveMQMonitoring


Networking

Network has been key to the evolution to date, and will be key also in the future Dynamic intelligent networks, comprehensive monitoring,

experiment workflow and dataflow systems integrated with network monitoring and controls

Network-aware brokerage Recent WLCG agreement to establish a working group on

better integrating networking as an integral component of LHC computing

LHCONE: progressively coming online to provide entry points for T1, T2, T3 sites into a dedicated LHC network fabric Complementing LHCOPN as the network that serves T0-T1

traffic


LHCONE and Dynamic Networks

LHCONE initiative established 2010 to (first and foremost) better support and integrate T2 networking given the strong role of T2s in the evolving LHC computing modelsManaged, segregated traffic for guaranteed performanceTraffic engineering, flow management to fully utilize resourcesLeverage advanced networking

LHCONE services being constructed: Multipoint, virtual network (logical traffic separation) Static & dynamic point to point circuits for guaranteed bandwidth, high-throughput

data movement Comprehensive end-to-end monitoring and diagnostics across all sites, paths

Software Defined Networking (OpenFlow) is probable technology for LHCONE in long term; under investigation as R&D. Converge on 2 year timescale (end of shutdown)

In the near term, establish a point-to-point dynamic circuits pilot


Conclusions

Focus for the future:Leverage the networkMonitoring, diagnosticsRobustness, reliabilitySimplicity, ease of useSupport, sustainabilityPilots, CondorFederationOracle + NoSQLCaching, WAN accessHTTP, RESTful APIsVirtualizationCommonalityOpen standardsLeverage beyond HEP

Left out of the focus areas: scalability Of course it is a principal one, but the success of

LHC computing to date exemplifies that we have a handle on scalability

Thanks to success to date, the future will be evolutionary, not revolutionary

For the future, focus on Technical tools and approaches most key to

having achieved scalability and manageability New tools and technologies with the greatest

promise to do the same in the future So ensure sustainability in the face of ever more

challenging requirements and shrinking manpower On the foundation of powerful networking that we are

still learning to exploit to its fullest And on the foundation of the Grid, which has served

us well and is evolving with us

Farewells to:LFC, WMS, SRM (mostly)


Thanks

This talk drew on presentations, discussions, comments, input from many

Thanks to all, including those I’ve missed Artur Barczyk, Doug Benjamin, Ian Bird, Jakob Blomer,

Simone Campana, Dirk Duellmann, Andrej Filipcic, Rob Gardner, Alexei Klimentov, Mario Lassnig, Fernando Megino, Dan Van Der Ster, Romain Wartel, …


Related Talks at GRID’12

I. Bird (Monday) A. Klimentov, Distributed computing challenges, HEP computing

beyond the grid (Tuesday) A. Vaniachine, Advancements in Big Data Processing (Tuesday) O. Smirnova, Implementation of common technologies in grid

middleware (Tuesday) A. Tsaregorodtsev, DIRAC middleware for distributed computing

systems (Tuesday) K. De, Status and evolution of ATLAS workload management system

PanDA (Tuesday) ATLAS computing workshop (Tuesday) D. Duellmann, Storage strategy and cloud storage evaluations at

CERN (Wednesday)


More Information

WLCG Technical Evolution Groups https://twiki.cern.ch/twiki/bin/view/LCG/WebHome Frontier talks at CHEP 2012

Comparison to NoSQL, D. Dykstra http://goo.gl/n0fcp CMS experience, D. Dykstra http://goo.gl/dYovr ATLAS experience, A. Dewhurst http://goo.gl/OIXsH

NoSQL & Hadoop in ATLAS DDM, M. Lassnig, CHEP 2012 http://goo.gl/0ab4I Xrootd http://xrootd.slac.stanford.edu/ Next generation WLCG File Transfer Service (FTS), Z. Molnar, CHEP 2012 http://goo.gl/mZWtp Evolution of ATLAS PanDA System, T. Maeno, CHEP 2012 http://goo.gl/lgDfU AliEn http://alien2.cern.ch/ DIRAC http://diracgrid.org/ CVMFS http://cernvm.cern.ch/portal/filesystem Status and Future Perspectives of CVMFS, J. Blomer, CHEP 2012 http://goo.gl/82csr LHCONE http://lhcone.web.cern.ch/


Supplementary


Frontier/Squid for Scalable Distributed DB Access

LHC experiments keep time-dependent data on detector conditions, alignment etc in Oracle at CERN

ATLAS alone runs 120,000+ jobs concurrently around the world. Many of these need conditions data access

How to provide scalable access to centrally Oracle-resident data? The answer, first developed by CDF/CMS at FNAL, and adopted for LHC: Frontier Frontier is a web service that translates DB queries into HTTP

Far fewer round trips between client and DB than using Oracle directly; fast over WAN Because it is HTTP based, caching web proxies (typically squid) can provide hierarchical,

highly scalable cache based data access Reuse of data among jobs is high, so caching is effective at off-loading Oracle

CMS Frontier deployment

Date post:	30-Dec-2015
Category:	Documents
Upload:	rebekkah-clay
View:	15 times
Download:	0 times

Technical Evolution in LHC Computing (with an ATLAS slant)

Documents