February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Distributed Data Access and Analysisfor Next Generation HENP Experiments
Harvey Newman, CaltechHarvey Newman, CaltechCHEP 2000, PadovaCHEP 2000, Padova February 10, 2000 February 10, 2000
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
LHC Computing: LHC Computing: DifferentDifferent from from Previous Experiment GenerationsPrevious Experiment Generations
Geographical dispersion:Geographical dispersion: of people and resources of people and resources Complexity:Complexity: the detector and the LHC environment the detector and the LHC environment Scale: Scale: Petabytes per year of dataPetabytes per year of data
~5000 Physicists 250 Institutes
~50 CountriesMajor challenges associated with:Major challenges associated with:
Coordinated Use of Distributed Computing Resources Coordinated Use of Distributed Computing Resources Remote software development and physics analysisRemote software development and physics analysis Communication and collaboration at a distanceCommunication and collaboration at a distance
R&D: A New Form of Distributed System: Data-GridR&D: A New Form of Distributed System: Data-Grid
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Four Experiments Four Experiments The Petabyte to Exabyte ChallengeThe Petabyte to Exabyte Challenge
ATLAS, CMS, ALICE, LHCBATLAS, CMS, ALICE, LHCBHiggs and New particles; Quark-Gluon Plasma; CP ViolationHiggs and New particles; Quark-Gluon Plasma; CP Violation
Data written to “tape” ~5 Petabytes/Year and UPData written to “tape” ~5 Petabytes/Year and UP (1 PB = 10 (1 PB = 101515 Bytes) Bytes)
0.1 to 1 Exabyte (1 EB = 100.1 to 1 Exabyte (1 EB = 101818 Bytes) Bytes) (~2010) (~2020 ?) Total for the LHC Experiments(~2010) (~2020 ?) Total for the LHC Experiments
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
To Solve: the LHC “Data Problem”To Solve: the LHC “Data Problem”
While the proposed LHC computing and data handling While the proposed LHC computing and data handling facilities are large by present-day standards,facilities are large by present-day standards,
They will not support FREE access, transport or processing They will not support FREE access, transport or processing for more than a minute part of the datafor more than a minute part of the data
Balance between proximity to large computational and data Balance between proximity to large computational and data handling facilities, and proximity to end users and more handling facilities, and proximity to end users and more
local resources for frequently-accessed datasets local resources for frequently-accessed datasets Strategies must be studied and prototyped, to ensure both:Strategies must be studied and prototyped, to ensure both:
acceptable turnaround times, and efficient resource utilisation acceptable turnaround times, and efficient resource utilisation Problems to be Explored Problems to be Explored
How to meet demands of hundreds of users who How to meet demands of hundreds of users who needneed transparent transparent access to local and remote data, in disk caches and tape stores access to local and remote data, in disk caches and tape stores
Prioritise hundreds of requests of local and remote communities,Prioritise hundreds of requests of local and remote communities, consistent with local and regional policies consistent with local and regional policies
Ensure that the system is dimensioned/used/managed Ensure that the system is dimensioned/used/managed optimally, for optimally, for the mixed workloadthe mixed workload
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC General Conclusions MONARC General Conclusions on LHC Computingon LHC Computing
Following discussions of computing and network requirements, Following discussions of computing and network requirements, technology evolution and projected costs, support requirements etc.technology evolution and projected costs, support requirements etc.
The scale of LHC “Computing” requires The scale of LHC “Computing” requires a worldwide effort to a worldwide effort to accumulate the necessary technical and financial resourcesaccumulate the necessary technical and financial resources
A distributed hierarchy of computing centres will lead to better useA distributed hierarchy of computing centres will lead to better useof the financial and manpower resources of CERN, the Collaborations,of the financial and manpower resources of CERN, the Collaborations,and the nations involved, than a highly centralized model focused at and the nations involved, than a highly centralized model focused at CERN CERN
The distributed model also provides better use of The distributed model also provides better use of physics opportunities at the LHC by physicists and students physics opportunities at the LHC by physicists and students
At the top of the hierarchy is the CERN Center, with the ability to perform At the top of the hierarchy is the CERN Center, with the ability to perform all analysis-related functions, but not the ability to do them completelyall analysis-related functions, but not the ability to do them completely
At the next step in the hierarchy is a collection of large, multi-service At the next step in the hierarchy is a collection of large, multi-service “Tier1 Regional Centres”, “Tier1 Regional Centres”, each with each with
10-20% of the CERN capacity devoted to one experiment10-20% of the CERN capacity devoted to one experiment There will be Tier2 or smaller special purpose centers in many regionsThere will be Tier2 or smaller special purpose centers in many regions
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Bandwidth Requirements Estimate (Mbps) [*]Bandwidth Requirements Estimate (Mbps) [*]ICFA Network TasICFA Network Tas
Year 1998 2000 2005BW Utilized Per Physicist
(and Peak BW Used)0.05-0.25
(0.5 - 2)0.2 - 2 (2 - 10)
0.8 - 10 (10 - 100)
BW Utilized by a UniversityGroup 0.25 - 10 1.5 - 45 34 - 622
BW to a Home-laboratory or Regional Center 1.5 - 45 34 -
155622 -5000
BW to a Central Laboratory Housing One or More Major
Experiments34 - 155 155 -
6222500 -10000
BW on a Transoceanic Link 1.5 - 20 34-155 622 -5000
See http://l3www.cern.ch/~newman/icfareq98.htmlSee http://l3www.cern.ch/~newman/icfareq98.htmlCirca 2000: Predictions roughly on track: Circa 2000: Predictions roughly on track: “Universal” BW Growth” by ~2X Per Year;“Universal” BW Growth” by ~2X Per Year;
622 Mbps on Links European and Transatlantic by ~2002-3622 Mbps on Links European and Transatlantic by ~2002-3Terabit/sec US Backbones (e.g. ESNet) by ~2003-5Terabit/sec US Backbones (e.g. ESNet) by ~2003-5
Caveats: Distinguish raw bandwidth and effective line capacity;Caveats: Distinguish raw bandwidth and effective line capacity;Maximum end-to-end rate for individual data flows Maximum end-to-end rate for individual data flows
“QoS”/ IP has a way to go“QoS”/ IP has a way to go
D388, D402,D274
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
CMS Analysis and CMS Analysis and Persistent Object StorePersistent Object Store
On Demand Object Creation
Data Organized In a(n Data Organized In a(n Object) Object) “Hierarchy”“Hierarchy”
Raw, Reconstructed (ESD), Raw, Reconstructed (ESD), Analysis Objects (AOD), TagsAnalysis Objects (AOD), Tags
Data DistributionData Distribution All raw, reconstructedAll raw, reconstructed
and master parameter DB’s and master parameter DB’s at CERN at CERN
All event TAG and AODs,All event TAG and AODs,and selected reconstructed and selected reconstructed data sets data sets at each regional center at each regional center
HOTHOT data (frequently data (frequently accessed) moved to RCsaccessed) moved to RCs
Goal of location and medium Goal of location and medium transparencytransparency
Online
Common Filters and Pre-Emptive Object
Creation
CMS
Slow ControlDetector
Monitoring “L4”L2/L3
L1
Persistent Object Store Object Database Management System
Filtering
Simulation Calibrations, Group Analyses User Analysis
Offline
C121
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
GIOD SummaryGIOD Summary
Hit
Track
Detector
GIOD hasGIOD has Constructed a Terabyte-scale Constructed a Terabyte-scale
set of fully simulated CMS set of fully simulated CMS eventsevents and used these to and used these to create a large OO databasecreate a large OO database
Learned how to create large Learned how to create large database federationsdatabase federations
Completed the “100” (to 170) Completed the “100” (to 170) Mbyte/sec CMS MilestoneMbyte/sec CMS Milestone
Developed prototype Developed prototype reconstruction and analysis reconstruction and analysis codes, and Java 3D OO codes, and Java 3D OO visualization demonstrators,visualization demonstrators, that work that work seamlessly seamlessly with with persistent objects over persistent objects over networksnetworks
Deployed facilities and Deployed facilities and database federations as database federations as useful testbedsuseful testbeds for for Computing Model studiesComputing Model studiesC226
C51
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Data Grid Hierarchy (CMS Example)Data Grid Hierarchy (CMS Example)
Tier2 Center ~1 TIPS
Online System
Offline Farm~20 TIPS
CERN Computer Center
Fermilab~4 TIPSFrance Regional
Center Italy Regional
Center Germany
Regional Center
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations
~100 MBytes/sec
~100 MBytes/sec
~2.4 Gbits/sec
100 - 1000 Mbits/sec
Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size
Physicists work on analysis “channels”.Each institute has ~10 physicists working on one or more channelsData for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight
Tier2 Center ~1 TIPS
Tier2 Center ~1 TIPS
Tier2 Center ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 3Tier 3
Tier 4Tier 4
1 TIPS = 25,000 SpecInt95PC (today) = 10-15 SpecInt95
Tier2 Center ~1 TIPS
Tier 2Tier 2
E277
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
LHC (and HEP) Challenges LHC (and HEP) Challenges of Petabyte-Scale Dataof Petabyte-Scale Data
Technical Requirements Optimize use of resources with next generation middleware
Co-Locate and Co-Schedule Resources and Requests Enhance database systems to work seamlessly
across networks: caching/replication/mirroring Balance proximity to centralized facilities, and
to end users for frequently accessed data
Requirements of the Worldwide Collaborative Nature of Experiments
Make appropriate use of data analysis resources in each world region, conforming to local and regional policies
Involve scientists and students in each world regionin front-line physics research Through an integrated collaborative environment
E163
C74,C292
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Time-Scale: CMS Time-Scale: CMS Recent “Events”Recent “Events”
A A PHASE TRANSITIONPHASE TRANSITION in our understanding of the role of CMS in our understanding of the role of CMS
Software and Computing occurred in October - November 1999Software and Computing occurred in October - November 1999 ““Strong Coupling” of S&C Task,Trigger/DAQ, Physics TDR,Strong Coupling” of S&C Task,Trigger/DAQ, Physics TDR,
detector performance studies and other main milestonesdetector performance studies and other main milestones Integrated CMS Software and Trigger/DAQ planning for the next Integrated CMS Software and Trigger/DAQ planning for the next
round:round: May 2000 MilestoneMay 2000 Milestone
Large simulated samples are required: ~ 1 Million events fullyLarge simulated samples are required: ~ 1 Million events fullysimulated a few times during 2000, in ~1 month simulated a few times during 2000, in ~1 month
A smoothly rising curve of computing and data handling needsA smoothly rising curve of computing and data handling needs from now on from now on
Mock Data Challenges from 2000 (1% scale) to 2005Mock Data Challenges from 2000 (1% scale) to 2005
Users want substantial parts of the functionality formerly Users want substantial parts of the functionality formerly planned for 2005, planned for 2005, Starting NowStarting Now
A108
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
RD45, RD45, GIOD:GIOD: Networked Object DatabasesNetworked Object Databases Clipper,GC;Clipper,GC; High speed access to Objects or File data High speed access to Objects or File data
FNAL/SAMFNAL/SAM for processing and analysis for processing and analysis SLAC/OOFS Distributed File System + Objectivity Interface SLAC/OOFS Distributed File System + Objectivity Interface NILE, Condor:NILE, Condor: Fault Tolerant Distributed Computing with Fault Tolerant Distributed Computing with
Heterogeneous CPU ResourcesHeterogeneous CPU Resources
MONARC:MONARC: LHC Computing Models: LHC Computing Models: Architecture, Simulation, Strategy, PoliticsArchitecture, Simulation, Strategy, Politics
PPDG:PPDG: First Distributed Data Services and First Distributed Data Services and Data Grid System Prototype Data Grid System Prototype
ALDAP:ALDAP: Database Structures and Access Database Structures and Access Methods for Methods for Astrophysics and HENP DataAstrophysics and HENP Data GriPhyN:GriPhyN: Production-Scale Data GridProduction-Scale Data Grid
Simulation/Modeling, Application + Network Simulation/Modeling, Application + Network Instrumentation, System Optimization/EvaluationInstrumentation, System Optimization/Evaluation
APOGEEAPOGEE
Roles of ProjectsRoles of Projectsfor HENP Distributed Analysisfor HENP Distributed Analysis
E277
A391
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC: Common ProjectMONARC: Common Project MModels odels OOf f NNetworked etworked AAnalysis nalysis
At At RRegional egional CCentersentersCaltech, CERN, Columbia, FNAL, Heidelberg, Caltech, CERN, Columbia, FNAL, Heidelberg,
Helsinki, INFN, IN2P3, KEK, Marseilles, MPI Helsinki, INFN, IN2P3, KEK, Marseilles, MPI Munich, Orsay, Oxford, TuftsMunich, Orsay, Oxford, Tufts
PROJECT GOALSPROJECT GOALS Develop “Baseline Models”Develop “Baseline Models” Specify the main parameters Specify the main parameters
characterizing the Model’s characterizing the Model’s performance: throughputs, latencies performance: throughputs, latencies
Verify resource requirement baselines: Verify resource requirement baselines: (computing, data handling, networks) (computing, data handling, networks)
TECHNICAL GOALSTECHNICAL GOALS Define the Define the Analysis ProcessAnalysis Process Define Define RC Architectures and ServicesRC Architectures and Services Provide Provide Guidelines for the final ModelsGuidelines for the final Models Provide a Provide a Simulation Toolset Simulation Toolset for Further for Further
Model studiesModel studies
622
Mbi
ts/s 622 M
bits/s
Univ 2
CERN350k SI95 350 Tbytes
Disk; Robot
Tier2 Ctr20k SI95 20 TB
Disk Robot
FNAL/BNL70k SI9570 Tbyte
Disk; Robot
622 M
bits/s
N X
622
Mbi
ts/s
622Mbits/s
622 Mbits/s
Univ1
UnivM
Model Circa Model Circa 20052005
F148
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC Working Groups/ChairsMONARC Working Groups/Chairs ““Analysis Process Design”Analysis Process Design” P. Capiluppi (Bologna, CMS)P. Capiluppi (Bologna, CMS)
“ “Architectures”Architectures” Joel Butler (FNAL, CMS) Joel Butler (FNAL, CMS)
“ “Simulation”Simulation” Krzysztof Sliwa (Tufts, ATLAS)Krzysztof Sliwa (Tufts, ATLAS)
““Testbeds”Testbeds” Lamberto Luminari (Rome, ATLAS) Lamberto Luminari (Rome, ATLAS)
““Steering”Steering” Laura Perini (Milan, ATLAS) Laura Perini (Milan, ATLAS) Harvey Newman (Caltech, CMS)Harvey Newman (Caltech, CMS)
& & “Regional Centres Committee”“Regional Centres Committee”
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC Architectures WG:MONARC Architectures WG: Regional Centre Facilities & Services Regional Centre Facilities & Services
Regional Centres Should ProvideRegional Centres Should Provide
All technical and data services required to do physics analysisAll technical and data services required to do physics analysis All Physics Objects, Tags and Calibration dataAll Physics Objects, Tags and Calibration data Significant fraction of raw dataSignificant fraction of raw data Caching or mirroring calibration constantsCaching or mirroring calibration constants Excellent network connectivity to CERN and the region’s usersExcellent network connectivity to CERN and the region’s users Manpower to share in the development of common validation Manpower to share in the development of common validation
and production softwareand production software A fair share of post- and re-reconstruction processingA fair share of post- and re-reconstruction processing Manpower to share in ongoing work on Common R&D ProjectsManpower to share in ongoing work on Common R&D Projects Excellent support services for training, documentation, Excellent support services for training, documentation,
troubleshooting at the Centre or remote sites served by ittroubleshooting at the Centre or remote sites served by it Service to members of other regionsService to members of other regions
Long Term Commitment for staffing, hardware evolution and supportLong Term Commitment for staffing, hardware evolution and supportfor R&D, as part of the distributed data analysis architecturefor R&D, as part of the distributed data analysis architecture
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC and Regional CentresMONARC and Regional Centres
MONARC RC FORUM: Representative Meetings QuarterlyMONARC RC FORUM: Representative Meetings Quarterly Regional Centre Planning well-advanced, with optimistic outlook, Regional Centre Planning well-advanced, with optimistic outlook,
in US (FNAL for CMS; BNL for ATLAS), France (CCIN2P3), Italy, UKin US (FNAL for CMS; BNL for ATLAS), France (CCIN2P3), Italy, UK Proposals submitted late 1999 or early 2000Proposals submitted late 1999 or early 2000
Active R&D and prototyping underway, especially in US, Italy, Active R&D and prototyping underway, especially in US, Italy, Japan; and UK (LHCb), Russia (MSU, ITEP), Finland (HIP) Japan; and UK (LHCb), Russia (MSU, ITEP), Finland (HIP)
Discussions in the national communities also underway in Discussions in the national communities also underway in Japan, Finland, Russia, GermanyJapan, Finland, Russia, Germany
There is a near-term need to understand the level and sharing of There is a near-term need to understand the level and sharing of support for LHC computing between CERN and the outside support for LHC computing between CERN and the outside institutes, to enable the planning in several countries to advance.institutes, to enable the planning in several countries to advance.
MONARC Uses traditional 1/3:2/3 sharing assumptionMONARC Uses traditional 1/3:2/3 sharing assumption
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Regional Center ArchitectureRegional Center Architecture Example by I. Gaines (MONARC) Example by I. Gaines (MONARC)
Tapes
Network from CERN
Networkfrom Tier 2& simulation centers
Tape Mass Storage & Disk Servers
Database Servers
PhysicsSoftware
Development
R&D Systemsand Testbeds
Info serversCode servers
Web ServersTelepresence
Servers
TrainingConsultingHelp Desk
ProductionReconstruction
Raw/Sim ESD
Scheduled, predictable
experiment/physics groups
ProductionAnalysis
ESD AODAOD DPD
Scheduled
Physics groups
Individual Analysis
AOD DPDand plots
Chaotic
Physicists Desktops
Tier 2
Local institutesCERN
Tapes
C169
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Data Grid: Tier2 Layer Data Grid: Tier2 Layer Create an Ensemble of (University-Based) Tier2 Create an Ensemble of (University-Based) Tier2
Data Analysis CentresData Analysis Centres Site ArchitecturesSite Architectures Complementary to the the Major Complementary to the the Major
Tier1 Lab-Based CentersTier1 Lab-Based Centers Medium-scale Linux CPU farm, Sun data server, RAID disk arrayMedium-scale Linux CPU farm, Sun data server, RAID disk array Less need for 24 X 7 Operation Less need for 24 X 7 Operation Some lower component costs Some lower component costs Less production-oriented, to respond to local and regional analysis Less production-oriented, to respond to local and regional analysis
priorities and needs priorities and needs Supportable by a small local team and physicists’ helpSupportable by a small local team and physicists’ help
One Tier2 Center in Each Region (e.g. of the US)One Tier2 Center in Each Region (e.g. of the US) Catalyze local and regional focus on particular sets of physics goalsCatalyze local and regional focus on particular sets of physics goals Encourage coordinated analysis developments emphasizing particular Encourage coordinated analysis developments emphasizing particular
physics aspects or subdetectors. Example: CMS EMU in Southwest USphysics aspects or subdetectors. Example: CMS EMU in Southwest US Emphasis on Training, Involvement of Students at UniversitiesEmphasis on Training, Involvement of Students at Universities
in Front-line Data Analysis and Physics Resultsin Front-line Data Analysis and Physics Results Include a high quality environment for desktop remote collaborationInclude a high quality environment for desktop remote collaboration
E277
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC Analysis Process ExampleMONARC Analysis Process Example
DAQ/RAWSlow
Control/Cal
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Monarc Analysis Model Baseline: Monarc Analysis Model Baseline: ATLAS or CMS “Typical” Tier1 RCATLAS or CMS “Typical” Tier1 RC
CPU PowerCPU Power ~100 KSI95~100 KSI95 Disk spaceDisk space ~100 TB~100 TB Tape capacityTape capacity 300 TB, 100 MB/sec300 TB, 100 MB/sec Link speed to Tier2Link speed to Tier2 10 MB/sec (1/2 of 155 Mbps)10 MB/sec (1/2 of 155 Mbps) Raw dataRaw data 1% 1% 10-15 TB/year10-15 TB/year ESD dataESD data 100%100% 100-150 TB/year100-150 TB/year Selected ESDSelected ESD 25%25% 5 TB/year 5 TB/year [*][*] Revised ESDRevised ESD 25%25% 10 TB/year 10 TB/year [*][*] AOD dataAOD data 100%100% 2 TB/year 2 TB/year [**][**] Revised AODRevised AOD 100%100% 4 TB/year 4 TB/year [**][**] TAG/DPDTAG/DPD 100%100% 200 GB/year 200 GB/year
Simulated dataSimulated data 25%25% 25 TB/year 25 TB/year (repository)(repository)
[*] Covering Five Analysis Groups; each selecting ~1% [*] Covering Five Analysis Groups; each selecting ~1% of Annual ESD or AOD data for a Typical Analysis of Annual ESD or AOD data for a Typical Analysis
[**] Covering All Analysis Groups[**] Covering All Analysis Groups
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC Testbeds WG: MONARC Testbeds WG: Isolation of Key ParametersIsolation of Key Parameters
Some Parameters Measured,Some Parameters Measured,Installed in the MONARC Simulation Models,Installed in the MONARC Simulation Models,and Used in First Round Validation of Models.and Used in First Round Validation of Models.
Objectivity AMS Response Time-Function, Objectivity AMS Response Time-Function, and its and its dependence on dependence on
Object clustering, page-size, data class-hierarchy Object clustering, page-size, data class-hierarchy and access patternand access pattern
Mirroring and caching (e.g. with the Objectivity DRO option)Mirroring and caching (e.g. with the Objectivity DRO option) Scalability of the System Under “Stress”: Scalability of the System Under “Stress”:
Performance as a function of the number of jobs, Performance as a function of the number of jobs, relative to the single-job performancerelative to the single-job performance
Performance and Bottlenecks Performance and Bottlenecks for a variety of data for a variety of data access patternsaccess patterns
Tests over LANs and WANsTests over LANs and WANs D235, D127
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC Testbeds WGMONARC Testbeds WG
Test-bed configuration defined and widely deployed Test-bed configuration defined and widely deployed “ “Use Case” Applications Using Objectivity:Use Case” Applications Using Objectivity:
GIOD/JavaCMS, CMS Test Beams, GIOD/JavaCMS, CMS Test Beams, ATLASFAST++, ATLAS 1 TB MilestoneATLASFAST++, ATLAS 1 TB Milestone Both LAN and WAN testsBoth LAN and WAN tests
ORCA4 (CMS)ORCA4 (CMS) First “Production” applicationFirst “Production” application Realistic data access patternsRealistic data access patterns Disk/HPSS Disk/HPSS
““Validation” Milestone Carried Out, with Simulation WGValidation” Milestone Carried Out, with Simulation WG
A108
C113
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC Testbed SystemsMONARC Testbed Systems
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Multitasking Processing ModelMultitasking Processing Model
Concurrent running tasks share resources (CPU, memory, I/O) “Interrupt” driven scheme:
For each new task or when one task is finished, an interrupt is generated and all “processing times” are recomputed.
It provides:
An easy way to apply different load balancing schemes
An efficient mechanism to simulate multitask processing
A Java 2-Based, CPU- and code-efficient simulation for distributed systems has been developed
Process-oriented discrete event simulationF148
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Role of SimulationRole of Simulationfor Distributed Systemsfor Distributed Systems
Simulations are widely recognized and used as essential toolsSimulations are widely recognized and used as essential tools for the design, performance evaluation and optimisation for the design, performance evaluation and optimisation
of complex distributed systemsof complex distributed systems From battlefields to agriculture; from the factory floor to From battlefields to agriculture; from the factory floor to
telecommunications systemstelecommunications systems Discrete event simulations with an appropriate and Discrete event simulations with an appropriate and
high level of abstraction high level of abstraction Just beginning to be part of the HEP cultureJust beginning to be part of the HEP culture
Some experience in trigger, DAQ and tightly coupledSome experience in trigger, DAQ and tightly coupledcomputing systems: CERN CS2 models (Event-oriented)computing systems: CERN CS2 models (Event-oriented)
MONARC (Process-Oriented; Java 2 Threads + Class Lib) MONARC (Process-Oriented; Java 2 Threads + Class Lib) These simulations are very different from HEP “Monte Carlos”These simulations are very different from HEP “Monte Carlos”
““Time” intervals and interrupts are the essentialsTime” intervals and interrupts are the essentials Simulation is a vital part of the study of site architectures,Simulation is a vital part of the study of site architectures,
network behavior, data access/processing/delivery strategies, network behavior, data access/processing/delivery strategies, for HENP Grid Design and Optimization for HENP Grid Design and Optimization
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Example : Physics Analysis at Example : Physics Analysis at Regional CentresRegional Centres
Similar data processingSimilar data processing
jobs are performed in jobs are performed in each of several RCs each of several RCs
Each Centre has “TAG”Each Centre has “TAG”and “AOD” databases and “AOD” databases replicated.replicated.
Main Centre provides Main Centre provides “ESD” and “RAW” data “ESD” and “RAW” data
Each job processes Each job processes AOD data, and also aAOD data, and also aa fraction of ESD and a fraction of ESD and RAW data.RAW data.
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Example: Physics AnalysisExample: Physics Analysis
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Simple Validation Measurements Simple Validation Measurements
The AMS Data Access Case The AMS Data Access Case
0
20
40
60
80
100
120
140
160
180
0 5 10 15 20 25 30 35
Nr. of concurrent jobs
Mea
n Ti
me pe
r jo
b [m
s]
Raw Data DBLAN
4 CPUs Client
Simulation Measurements Distribution of 32 Jobs’ Processing Time
monarc01
05
101520253035
100 105 110 115 120
Simulationmean 109.5
Measurementmean 114.3
C113
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC Phase 3MONARC Phase 3
INVOLVING CMS, ATLAS, LHCb, ALICETIMELY and USEFUL IMPACT:
Facilitate the efficient planning and design of mutually compatible site and network architectures, and services
Among the experiments, the CERN Centre and Regional Centres
Provide modelling consultancy and service to the experiments and Centres
Provide a core of advanced R&D activities, aimed at LHC computing system optimisation and production prototyping
Take advantage of work on distributed data-intensive computing for HENP this year in other “next generation” projects [*]
For example PPDG
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC Phase 3MONARC Phase 3
Technical Goal: System OptimisationTechnical Goal: System OptimisationMaximise Throughput and/or Reduce Long TurnaroundMaximise Throughput and/or Reduce Long Turnaround
Phase 3 System Design ElementsPhase 3 System Design Elements RESILIENCE,RESILIENCE, resulting from flexible management of each data resulting from flexible management of each data
transaction, especially over WANstransaction, especially over WANs SYSTEM STATE & PERFORMANCE TRACKING,SYSTEM STATE & PERFORMANCE TRACKING, to match and to match and
co-schedule requests and resources, detect or predict faultsco-schedule requests and resources, detect or predict faults FAULT TOLERANCE,FAULT TOLERANCE, resulting from robust fall-back strategies resulting from robust fall-back strategies
to recover from bottlenecks, or abnormal conditionsto recover from bottlenecks, or abnormal conditions
Base developments on large scale testbed prototypesBase developments on large scale testbed prototypesat every stage: for example ORCA4at every stage: for example ORCA4
[*] See H. Newman, http://www.cern.ch/MONARC/progress_report/longc7.html
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
MONARC StatusMONARC Status
MONARC is well on its way to specifying baseline Models MONARC is well on its way to specifying baseline Models representing cost-effective solutions to LHC Computing.representing cost-effective solutions to LHC Computing.
Discussions have shown that LHC computing has Discussions have shown that LHC computing has a new scale and level of complexity. a new scale and level of complexity.
A Regional Centre hierarchy of networked centres A Regional Centre hierarchy of networked centres appears to be the most promising solution.appears to be the most promising solution.
A powerful simulation system has been developed, and is aA powerful simulation system has been developed, and is avery useful toolset for further model studies. very useful toolset for further model studies.
Synergy with other advanced R&D projects has been identified.Synergy with other advanced R&D projects has been identified. Important information, and example Models have been Important information, and example Models have been
provided:provided: Timely for the Hoffmann Review and discussions of LHC Timely for the Hoffmann Review and discussions of LHC
Computing over the next monthsComputing over the next months MONARC Phase 3 has been ProposedMONARC Phase 3 has been Proposed
Based on prototypes, with increasing detail and realismBased on prototypes, with increasing detail and realism Coupled to Mock Data Challenges in 2000Coupled to Mock Data Challenges in 2000
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
The Particle Physics Data Grid (PPDG)The Particle Physics Data Grid (PPDG)
Coordinated reservation/allocation techniques;Coordinated reservation/allocation techniques; Integrated Instrumentation, DiffServIntegrated Instrumentation, DiffServ First Year Goal: First Year Goal: Optimized cached read access to 1-10 Gbytes, Optimized cached read access to 1-10 Gbytes,
drawn from a total data set of up to One Petabytedrawn from a total data set of up to One Petabyte
PRIMARY SITEPRIMARY SITEData Acquisition,Data Acquisition,
CPU, Disk, CPU, Disk, Tape RobotTape Robot
SECONDARY SITESECONDARY SITECPU, Disk, CPU, Disk, Tape RobotTape Robot
Site to Site Data Replication Service
100 Mbytes/sec
DoE/NGI Next Generation Internet ProjectDoE/NGI Next Generation Internet ProjectANL, BNL, Caltech, FNAL, JLAB, LBNL, ANL, BNL, Caltech, FNAL, JLAB, LBNL,
SDSC, SLAC, U.Wisc/CSSDSC, SLAC, U.Wisc/CS
Multi-Site Cached File Access ServiceUniversityUniversityCPU, Disk, CPU, Disk,
UsersUsersPRIMARY SITEPRIMARY SITE
DAQ, Tape, DAQ, Tape, CPU, CPU,
Disk, RobotDisk, Robot
Satellite SiteSatellite SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsersUniversityUniversityCPU, Disk, CPU, Disk,
UsersUsersUniversityUniversityCPU, Disk, CPU, Disk,
UsersUsers
UniversityUniversityCPU, Disk, CPU, Disk,
UsersUsers
Satellite SiteSatellite SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
PPDG: Architecture for Reliable High PPDG: Architecture for Reliable High Speed Data DeliverySpeed Data Delivery
Object-based andObject-based andFile-based Application File-based Application
ServicesServices
Cache ManagerCache Manager
File AccessFile AccessServiceService
Matchmaking Matchmaking ServiceService
Cost EstimationCost EstimationFile FetchingFile Fetching
ServiceService
File Replication File Replication IndexIndex
End-to-End End-to-End Network ServicesNetwork Services
Mass Storage Mass Storage ManagerManager
Resource Resource ManagementManagement
File MoverFile Mover
File MoverFile Mover
Site BoundarySite Boundary Security DomainSecurity Domain
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Distributed Data Delivery and Distributed Data Delivery and LHC Software ArchitectureLHC Software Architecture
Software Architectural ChoicesSoftware Architectural Choices
Traditional, single-threaded applicationsTraditional, single-threaded applications Allow for data arrival and reassembly Allow for data arrival and reassembly
OROR
Performance-Oriented (Complex)Performance-Oriented (Complex) I/O requests up-front; multi-threaded; data driven;I/O requests up-front; multi-threaded; data driven;
respond to ensemble of (changing) cost estimates respond to ensemble of (changing) cost estimates Possible code movement as well as data movementPossible code movement as well as data movement Loosely coupled, dynamicLoosely coupled, dynamic
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
ALDAPALDAP: : AAccessingccessing LLargearge DDataata ArchivesArchivesinin AAstronomystronomy andand PParticle Physicsarticle Physics
NSF Knowledge Discovery Initiative (KDI)NSF Knowledge Discovery Initiative (KDI)CALTECH, Johns Hopkins, FNAL(SDSS)CALTECH, Johns Hopkins, FNAL(SDSS)
Explore advanced adaptiveExplore advanced adaptive database structures, physical database structures, physical data storage hierarchies for archival storage of next data storage hierarchies for archival storage of next generation astronomy and particle physics datageneration astronomy and particle physics data
Develop spatial indexes, novel data organizations, Develop spatial indexes, novel data organizations, distribution and delivery strategies, for distribution and delivery strategies, for Efficient and transparent access to data across networksEfficient and transparent access to data across networks
Example (Kohonen) Maps for data “self-organization”Example (Kohonen) Maps for data “self-organization” Create prototype network-distributed data query execution Create prototype network-distributed data query execution
systems using Autonomous Agent workerssystems using Autonomous Agent workers Explore commonalities and find effective Explore commonalities and find effective common solutionscommon solutions
for particle physics and astrophysics datafor particle physics and astrophysics data
ALDAP (NSF/KDI) ProjectALDAP (NSF/KDI) Project
C226
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Mobile Agents: Reactive, Autonomous, Goal Driven, AdaptiveMobile Agents: Reactive, Autonomous, Goal Driven, Adaptive Execute AsynchronouslyExecute Asynchronously Reduce Network Load: Local ConversationsReduce Network Load: Local Conversations Overcome Network Latency; Some OutagesOvercome Network Latency; Some Outages Adaptive Adaptive Robust, Fault Tolerant Robust, Fault Tolerant Naturally Heterogeneous Naturally Heterogeneous Extensible Concept: Extensible Concept: Agent HierarchiesAgent Hierarchies
Beyond Traditional Architectures:Beyond Traditional Architectures:Mobile Agents (Java Aglets)Mobile Agents (Java Aglets)
““Agents are objects with rules and legs” -- D. TaylorAgents are objects with rules and legs” -- D. Taylor
Application
Ser
vice
Age
nt
Agent
Age
nt
Age
ntA
gent
Age
ntA
gent
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
D9
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Grid Services Architecture [*]:Grid Services Architecture [*]:Putting it all TogetherPutting it all Together
GridGridFabricFabric
GridGridServicesServices
ApplnApplnToolkitsToolkits
ApplnsApplns
Archives, networks, computers, display devices, etc.;Archives, networks, computers, display devices, etc.;associated local services associated local services
Protocols, authentication, policy, resource Protocols, authentication, policy, resource management, instrumentation, data discovery, etc.management, instrumentation, data discovery, etc.
......RemoteRemote
vizviztoolkittoolkit
RemoteRemotecomp.comp.toolkittoolkit
RemoteRemotedatadata
toolkittoolkit
RemoteRemotesensorssensorstoolkittoolkit
RemoteRemotecollab.collab.toolkittoolkit
HEP Data-Analysis HEP Data-Analysis Related ApplicationsRelated Applications
[*] Adapted from Ian Foster[*] Adapted from Ian Foster E403
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Grid Hierarchy Goals: Better Resource Grid Hierarchy Goals: Better Resource Use Use andand Faster Turnaround Faster Turnaround
Efficient resource use and improved responsiveness Efficient resource use and improved responsiveness through:through:
Treatment of the ensemble of site and network resourcesTreatment of the ensemble of site and network resourcesas an integrated (loosely coupled) systemas an integrated (loosely coupled) system
Resource discovery, query estimation (redirection), Resource discovery, query estimation (redirection), co-scheduling, prioritization, local and global allocations co-scheduling, prioritization, local and global allocations
Network and site “instrumentation”: performance Network and site “instrumentation”: performance tracking, monitoring, forward-prediction, problem tracking, monitoring, forward-prediction, problem trapping and handlingtrapping and handling
Exploit superior network infrastructures (national,Exploit superior network infrastructures (national,land-based) per unit cost for frequently accessed dataland-based) per unit cost for frequently accessed data
Transoceanic links relatively expensiveTransoceanic links relatively expensive Shorter links Shorter links normally higher throughput normally higher throughput
Ease development, operation, management and security, Ease development, operation, management and security, through the use of layered, (de facto) standard servicesthrough the use of layered, (de facto) standard services
E163
E345
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Grid Hierarchy Concept:Grid Hierarchy Concept:Broader AdvantagesBroader Advantages
Greater flexibility to pursue different physics interests, Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by regionpriorities, and resource allocation strategies by region
Lower tiers of the hierarchy Lower tiers of the hierarchy More local control More local control Partitioning of users into Partitioning of users into “proximate”“proximate” communities communities
into for support, troubleshooting, mentoringinto for support, troubleshooting, mentoring Partitioning of facility tasks, to manage and focus Partitioning of facility tasks, to manage and focus
resourcesresources ““Grid” integration and common services are a principalGrid” integration and common services are a principal
means for effective worldwide resource coordination means for effective worldwide resource coordination
An Opportunity to maximize global funding resources and An Opportunity to maximize global funding resources and their effectiveness, while meeting the needs for analysis their effectiveness, while meeting the needs for analysis
and physicsand physics
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Grid Development IssuesGrid Development Issues Integration of applications with Grid MiddlewareIntegration of applications with Grid Middleware
Performance-oriented user application software architecturePerformance-oriented user application software architectureneeded, to deal with the realities of data access and deliveryneeded, to deal with the realities of data access and delivery
Application frameworks must work with system state and Application frameworks must work with system state and policy information (“instructions”) from the Gridpolicy information (“instructions”) from the Grid
ODBMS’s must be extended to work across networksODBMS’s must be extended to work across networks ““Invisible” (to the DBMS) data transport, and catalog updateInvisible” (to the DBMS) data transport, and catalog update
Interfacility cooperation at a new level, across Interfacility cooperation at a new level, across world regionsworld regions
Agreement on the use of standard Grid components,Agreement on the use of standard Grid components,services, security and authenticationservices, security and authentication
Match with heterogeneous resources, performance levels,Match with heterogeneous resources, performance levels,and local operational requirementsand local operational requirements
Consistent policies on use of local resources by remoteConsistent policies on use of local resources by remotecommunitiescommunities
Accounting and “exchange of value” softwareAccounting and “exchange of value” software
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Grid Hierarchy Concept:Grid Hierarchy Concept:Broader AdvantagesBroader Advantages
Greater flexibility to pursue different physics interests, Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by regionpriorities, and resource allocation strategies by region
Lower tiers of the hierarchy Lower tiers of the hierarchy More local control More local control Partitioning of users into Partitioning of users into “proximate”“proximate” communities communities
into for support, troubleshooting, mentoringinto for support, troubleshooting, mentoring Partitioning of facility tasks, to manage and focus Partitioning of facility tasks, to manage and focus
resourcesresources ““Grid” integration and common services are a principalGrid” integration and common services are a principal
means for effective worldwide resource coordination means for effective worldwide resource coordination
An Opportunity to maximize global funding resources and An Opportunity to maximize global funding resources and their effectiveness, while meeting the needs for analysis their effectiveness, while meeting the needs for analysis
and physicsand physics
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
Worldwide Integrated Distributed SystemsWorldwide Integrated Distributed Systemsfor Dynamic Content Delivery Circa 2000for Dynamic Content Delivery Circa 2000
Akamai,Akamai, Adero, Sandpiper Server Networks Adero, Sandpiper Server Networks 11200 200 Thousands Thousands of Network-Resident Serversof Network-Resident Servers
25 25 60 ISP Networks 60 ISP Networks 25 25 30 Countries 30 Countries 40+ Corporate Customers40+ Corporate Customers $ 25 B Capitalization$ 25 B Capitalization
Resource DiscoveryResource Discovery Build “Weathermap” of Server Network (State Tracking)Build “Weathermap” of Server Network (State Tracking) Query Estimation; Matchmaking/Optimization; Query Estimation; Matchmaking/Optimization;
Request rerouting Request rerouting Virtual IP Addressing Virtual IP Addressing
Mirroring, CachingMirroring, Caching (1200) Autonomous-Agent Implementation(1200) Autonomous-Agent Implementation
Content Delivery Networks:Content Delivery Networks:a Web-enabled Pre- “Data Grid”a Web-enabled Pre- “Data Grid”
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
The Need for a “Grid”: the BasicsThe Need for a “Grid”: the Basics Computing for LHC will never be “enough” to fully exploit the physics Computing for LHC will never be “enough” to fully exploit the physics
potential, or exhaust the scientific potential of the collaborationspotential, or exhaust the scientific potential of the collaborations The basic Grid elements are required to make the ensemble of The basic Grid elements are required to make the ensemble of
computers, networks, storage management systems, and function as a computers, networks, storage management systems, and function as a self-consistent system, implementing consistent (and complex) self-consistent system, implementing consistent (and complex) resource usage policies.resource usage policies.
A basic “Grid” will an information gathering/ workflow guiding/ A basic “Grid” will an information gathering/ workflow guiding/ monitoring/ and repair-initiating entity, designed to ward off resource monitoring/ and repair-initiating entity, designed to ward off resource wastage (or meltdown) in a complex, distributed and somewhat “open” wastage (or meltdown) in a complex, distributed and somewhat “open” system. system.
Without such information, experience shows that effective global use of Without such information, experience shows that effective global use of such a large, complex and diverse ensemble of resources is likely to such a large, complex and diverse ensemble of resources is likely to fail; or at the very least be sub-optimalfail; or at the very least be sub-optimal
The time to accept the charge to build a Grid, for sober and The time to accept the charge to build a Grid, for sober and compelling reasons, is nowcompelling reasons, is now
Grid-like systems are starting to appear in industry and commerceGrid-like systems are starting to appear in industry and commerce But Data Grids on the LHC scale will not be in production untilBut Data Grids on the LHC scale will not be in production until
significantly after 2005significantly after 2005
February 10, 2000: Distributed Data Access and Analysis for HENP Experiments Harvey B Newman (CIT)
SummarySummary
The HENP/LHC Data Analysis Problem The HENP/LHC Data Analysis Problem Petabyte scale compact binary data, and computing Petabyte scale compact binary data, and computing
resources distributed worldwideresources distributed worldwide Development of an integrated robust networked data access Development of an integrated robust networked data access
processing and analysis system is mission-criticalprocessing and analysis system is mission-critical An aggressive R&D program is requiredAn aggressive R&D program is required
to develop reliable, seamless systems that work acrossto develop reliable, seamless systems that work across an ensemble of networks an ensemble of networks
An effective inter-field partnership is now developingAn effective inter-field partnership is now developingthrough many R&D projects (PPDG, GriPhyN, ALDAP…)through many R&D projects (PPDG, GriPhyN, ALDAP…)
HENP analysis is now one of the driving forcesHENP analysis is now one of the driving forces for the development of “Data Grids” for the development of “Data Grids”
Solutions to this problem could be widely applicable in Solutions to this problem could be widely applicable in other scientific fields and industry, by LHC startupother scientific fields and industry, by LHC startup
National and Multi-National “Enterprise Resource Planning”National and Multi-National “Enterprise Resource Planning”