Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | griffin-clifford-pierce |
View: | 215 times |
Download: | 0 times |
U.S. ATLAS Computing Facilities
DOE/NFS Review of US LHC Software & Computing Projects
Bruce G. Gibbard, BNL18-20 January 2000
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
2
US ATLAS Computing Facilities• Facilities procured, installed and operated
– …to meet U.S. ‘MOU’ Obligations• Direct IT responsibility (Monte Carlo, for example)• Support for detector construction, testing, & calib.• Support for software development and testing
– …to enable effective participation by US physicists in ATLAS physics program!
• Direct access to and analysis of physics data sets• Support simulation, re-reconstruction, and
reorganization of data associated with that analysis
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
3
Setting the Scale• Uncertainties in Defining Facilities Scale
– Five years of detector, algorithm & software development
– Five years of computer technology evolution• Start from ATLAS Estimate & Regional Center
Guidelines • Adjust for US ATLAS perspective (experience,
priorities and facilities model)
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
4
ATLAS Estimate & Guidelines
• Tier 1 Center in ‘05 should include ...– 30,000 SPECint95 for Analysis– 20,000 SPECint95 for Simulation– 100 TBytes/year of On-line (Disk) Storage– 200 TBytes/year of Near-line (Robotic Tape)
Storage– 100 Mbit/sec connectivity to CERN
• Assume no major raw data processing or handling outside of CERN
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
5
US ATLAS Perspective• US ATLAS facilities must be adequate to meet
any reasonable U.S. ATLAS computing needs (U.S. role in ATLAS should not be constrained by a computing shortfall, rather the U.S. role should be enhanced by computing strength)
• There must be significant capacity beyond that formally committed to International ATLAS which can be allocated at the discretion of U.S. ATLAS
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
6
Facilities Architecture• Consists of Transparent Hierarchically Distributed
Computing Resources Connected into a GRID – Primary ATLAS Computing Centre at CERN– US ATLAS Tier 1 Computing Center at BNL
• National in scope at ~20% of CERN– US ATLAS Tier 2 Computing Centers
• Six, each regional in scope at ~20% of Tier 1• Likely one of them at CERN
– US ATLAS Institutional Computing Facilities• Institutional in scope, not project supported
– US ATLAS Individual Desk Top Systems
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
7
Schematic of Model
ATLAS CERN Computing
Center
US ATLAS Tier 2 Computing
Center
US ATLAS Tier 1 Computing
Center
Tier 3 Computing
US ATLAS Tier 2 Computing
Center
US ATLAS Tier 2 Computing
Center
Tier 3 Computing
Tier 3 Computing
Tier 3 Computing
US ATLAS User
International
National
Regional
Institutional
US ATLAS User
US ATLAS User
US ATLAS User
US ATLAS User
US ATLAS User
US ATLAS User
US ATLAS User
Individual
.
.
.
LAN
Atlantic
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
8
Distributed Model• Rationale (benefits)
– Improved user access to computing resources • Higher performance regional networks• Local geographic travel
– Enable local autonomy • Less widely shared resources• More locally managed
– Increased capacities • Encourage integration of other equipment & expertise
– Institutional, base program• Additional funding options
– Com Sci, NSF
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
9
Distributed Model (2)• But increase vulnerability (Risk)
– Increased dependence on network– Increased dependence on GRID infrastructure software
and hence R&D efforts– Increased dependence on facility modeling tools– More complex management
• Risk / benefit analysis must yield positive result
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
10
Adjusted For U.S. ATLAS Perspective
• Total US ATLAS facilities in ‘05 should include ...– 10,000 SPECint95 for Re-reconstruction– 85,000 SPECint95 for Analysis– 35,000 SPECint95 for Simulation– 190 TBytes/year of On-line (Disk) Storage– 300 TBytes/year of Near-line (Robotic Tape) Storage– Dedicated OC12 622 Mbit/sec Tier 1 connectivity to each
Tier 2– Dedicated OC12 622 Mbit/sec to CERN
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
11GRID Infrastructure
• GRID infrastructure software must supply– Efficiency (optimizing hardware use)– Transparency (optimizing user effectiveness)
• Projects– PPDG : Distributed data services - Common Day talk by D. Malon– APOGEE: Complete GRID infrastructure including: distributed
resources management, modeling, instrumentation, etc.– GriPhyN: Staged development toward delivery of a production system
• Alternative to success with these projects is a cumbersome to use and/or reduce efficiency overall set of facilities
• U.S. ATLAS involvement includes - ANL, BNL, LBNL
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
12
Facility Modeling• Performance of Complex Distribute System is
Difficult but Necessary to Predict• MONARC - LHC centered project
– Provide toolset for modeling such systems– Develop guidelines for designing such systems– Currently capable of relevant analyses– Common Day talk by K. Sliwa
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
13Technology Trends
• CPU– Range: Commodity processors -> SMP servers– Factor 2 decrease in price/performance in 1.5 years
• Disk– Range: Commodity disk -> RAID disk– Factor 2 decrease in price/performance in 1.5 years
• Tape Storage– Range: Desktop storage -> High-end storage– Factor 2 decrease in price/performance in 1.5 - 2 years
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
14
Technology Trends & Choices• For Costing Purpose
– Start with familiar established technologies– Project by observed exponential slopes
• Conservative Approach– There are no known near term show stoppers to
evolution of these established technologies– A new technology would have to be more cost
effective to supplant projection of an established technology
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
15
Technology Choices• CPU Intensive processing
– Farms of commodity processors - Intel/Linux• I/O Intensive Processing and Serving
– Mid-scale SMP’s (SUN, IBM, etc.)• Online Storage (Disk)
– Fibre Channel Connected RAID• Nearline Storage (Robotic Tape System)
– STK / 9840 / HPSS• LAN
– Gigabit Ethernet
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
16
Requirements Profile• Facilities Ramp-up Driven by…
– Core software needs• ODBMS scalability tests in ‘01-’02 time frame
– Subdetector needs• Modest for next few years
– Mock Data Exercises - not officially schedule so…• Assume MDC I at 10% scale in 2003• and MDC II at 30% scale in 2004
– Facilities model validation
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
17
Tier 1• Full Function Facility Including ...
– Dedicated Connectivity to CERN – Primary Site for Storage/Serving
• Cache/Replicate CERN & other data needed by US ATLAS– Computation
• Primary Site for Re-reconstruction (perhaps only site)• Major Site for Simulation & Analysis (~2 x Tier 2)
– Regional support plus catchall for those without a region
– Repository of Technical Expertise and Support• Hardware, OS’s, utilities, other standard elements of U.S. ATLAS• Network, AFS, GRID, & other infrastructure elements of WAN
model
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
18
Tier 1 (2)
• Commodity processor farms (Intel/Linux)• Mid-scale SMP servers (SUN)• Fibre Channel connected RAID disk• Robotic tape / HSM system (STK / HPSS)
FY 1999 FY 2000 FY 2001 FY 2002 FY 2003 FY 2004 FY 2005 FY 2006CPU - kSPECint95 0.2 0.5 1 3 6 17 50 83 Disk - TB 0.2 0.2 2 5 13 34 100 169 - MBytes/sec 40 40 256 607 1,310 2,481 4,999 6,756 Tertiary Storage - TB 1.0 5.0 11 20 34 101 304 607 - MBytes/sec - 20 20 65 65 166 394 622
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
19Current Tier 1 Status
• U.S. ATLAS Tier 1 facility now operating as a ~5 % adjunct to the RHIC Computing Facility including – Intel/Linux farms (28 CPU’s)– Sun E450 server (2 CPU’s)– 200 GBytes of Fibre Channel RAID Disk– Intel/Linux web server– Archiving via low priority HPSS Class of Service– Shared use of an AFS server (10 GBytes)
C u rren t U .S . A T L A S T ier 1 C a p a cities
C o m p u te 28 C P U 's 50 0 S P E C in t95D is k F ib re C h an ne l 25 0 G bytesS u n S e rve r / N IC 2 C P U 's 10 0 M b it/se c
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
20
E450(NFS Server)
Dual Intel
Dual Intel
LANSwitch
SANHub
BackupServer
HPSSArchiveServer
Ÿ XXX.USATLAS.BNL.GOVŸ E450 front line with SSHŸ Objectivity Lock Server
200 GBytesRAID Disk
US ATLAS Tier 1 Facility
Intel/LinuxDual 450 MHz512 MBytes18 GBytes
100 Mbit Ethernet(4 of 14
operational)
9840Tapes
AFSServers
AFS
~10 GBytesRAID DiskAtlas AFS
Ÿ LSFŸ AFSŸ ObjectivityŸ Gnu etc.
Atlas Equipment
RCF Infrastructure
~50 GBytes
Intel/LinuxW eb Server
Current Configuration
128 MBytes18 GBytes
.
.
.
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
21RAID Disk Subsystem
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
22
Intel/Linux Processor Farm
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
23Intel/Linux Nodes
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
24
Tier 1 Staffing Estimate
Subproject Description1999 2000 2001 2002 2003 2004 2005 2006
Detailed US ATLAS facilities planning 0 1 1 0.5 0.5 0.5 0.5 0.5
Participation in GRID development 0 0.5 1 1 1 1 1 1
US ATLAS GRID Infrastructure 0 0 0.5 1 2 2 3 3
General computing environment 1 1 1 1.5 2 3 4 4
CPU intensive resources 0 0.5 0.5 0.5 1 1.5 2 2
Analysis systems 0 0 0.5 1 1.5 1.5 2.5 2.5
Online data servers & systems 0 0 0.5 0.5 1 1 1.5 1.5
HSM hardware 0 0.5 1 1 1 1 1 1
HSM software 0 0 0.5 0.5 1 1 2 2
LAN connectivity 0 0 0 0.5 0.5 1 1 1
WAN connectivity 0 0 0.5 0.5 1 1 1 1
Measure/monitor performance 0 0 0.5 1 2 2 2.5 2.5
Physical & cyber security 0 0 0 0.5 0.5 1 1 1
Management & administration 0 0.5 0.5 1 1 1.5 2 2
Total 1 4 8 11 16 19 25 25
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
25
Tier 2 Ramp-up• Assume 2 years for Tier 2 to fully establish
– Initiate first Tier 2 in 2001• True Tier 2 prototype• Demonstrate Tier 1 - Tier 2 interaction
– Second Tier 2 initiated in 2002 (CERN?)– Four remaining initiated in 2003
• All fully operational by 2005• Six are to be identical (CERN exception?)
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
26
Tier 2• Limit Personnel and Maintenance Support Costs• Focused Function Facility
– Excellent connectivity to Tier 1 (Network + GRID)– Tertiary storage via Network at Tier 1 (none local)– Primary Analysis site for its region– Major Simulation capabilities– Major online storage cache for its region
• Leverage Local Expertise and Other Resources– Part of site selection criteria– For example: ~1 FTE contributed,
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
27
Tier 2 (2)
• Commodity processor farms (Intel/Linux)• Mid-scale SMP servers• Fibre Channel connected RAID disk
FY 1999 FY 2000 FY 2001 FY 2002 FY 2003 FY 2004 FY 2005 FY 2006CPU - kSPECint95 - - 1.0 2 2 5 15 25 Disk - TB - - 0.6 2 2 5 15 25 - MBytes/sec - - 71 247 248 697 2,256 3,887
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
28Tier 1 / Tier 2 Staffing
(In Pseudo Detail)
Function
GRID / Distributed SystemComputing EnvironmentSimulation/Reconstruction SystemsAnalysis SystemsData Storing & ServingNetworkMeasure & Monitor PerformanceManagementTotal
Assume 1 FTE is Total of 6 FTE's arecontributed from base contributed from base
25 18
0.20.2
0.33.0
0.5 4 3.0
0.4
0.4
1.82.52.5
52
4.52
2.52.41.2
3.0
1.2
0.5
Tier 1
3.02.4
(FTE's) (FTE's) (FTE's)Typical Tier 2 Total 6 x Tier 2
0.5
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
29
Staff Evolution
US ATLAS Facilities Staffing (FTE's)
FY '99 FY '00 FY '01 FY '02 FY '03 FY '04 FY '05 FY '06Tier 1
Tier 1 Total 1 4 8 11 16 19 25 25 Tier 2
Initial Year Center - - 1 2 2 2 2 2 Second Year Center - - - 1 2 2 2 2 4 Final Year Centers - - - - 4 8 8 8
Tier 2 Total - - 1 3 8 12 12 12 US ATLAS Facilities Total 1 4 9 14 24 31 37 37
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
30
Network• Tier 1 Connectivity to CERN and to Tier 2’s is
Critical to Facilities Model– Must be adequate– Must be guaranteed and allocable (dedicated and
differentiate)– Should grow with need; OC12 should be practical
by 2005– While estimate is highly uncertain this cost must
be covered in a distributed facilities plan
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
31WAN Configurations and Cost
(FY 2000 k$)
1999 2000 2001 2002 2003 2004 2005 2006Tier 1 to CERN Link T3 OC3 OC12 OC12Annual CERN Link Cost 0 0 0 0 200 300 400 300
Number of Tier 2 to Tier 1 OC3 Links 0 0 1 2 5 4 0 0Number of Tier 2 to Tier 1 OC12 Links 0 0 0 0 0 1 5 5Estimate cost of domestic OC3 250 200 160 128 102 82 66 52Estimate cost of domestic OC12 500 400 320 256 205 164 131 105Total Domestic WAN Cost 0 0 160 256 512 492 655 524
Total WAN Cost 0 0 160 256 712 792 1055 824
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
32
Capacities by Year
FY 1999 FY 2000 FY 2001 FY 2002 FY 2003 FY 2004 FY 2005 FY 2006Operational Tier 2 Facilities - - 1 2 6 6 6 6 CPU - SPECint95
Tier 1 0.2 1 1 3 6 17 50 83 Tier 2 - - 1 3 12 30 89 154
Total CPU 0.2 1 2 6 18 47 140 237 Disk - TB
Tier 1 0.2 0 2 5 13 34 100 169 Tier 2 - - 1 3 12 28 89 147
Total Disk 0.2 0 3 8 25 62 189 316 Tape Storage (Tier 1) - TB
Total Tape 1 5 11 20 34 101 304 607
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
33Annual Equipment Costs at Tier 1 Center
(FY 2000 k$)FY '99 FY '00 FY '01 FY '02 FY '03 FY '04 FY '05 FY '06
Linux Farm 16 40 25 60 70 150 300 200SMP Servers 40 0 50 0 100 0 150 0Disk Subsystem 30 0 118 120 240 400 860 600Robotic System 0 0 125 0 0 125 0 125Tape Drives 0 50 0 50 0 50 75 50Local Area Network 0 0 25 25 50 50 100 75Media 1 15 15 15 15 50 100 100Desktops 0 25 50 75 75 75 75 75Hardware Maintenance 0 30 18 66 84 135 182 330Software Licenses 10 20 30 60 90 350 250 250Misc. 5 20 40 60 90 90 100 100
Total 102 200 496 531 814 1475 2192 1905Overhead 12% 12% 12% 12% 12% 12% 12% 12%Total Cost 114 224 556 594 912 1653 2454 2134
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
34Annual Equipment Costs at Tier 2 Center
(FY 2000 k$) FY '99 FY '00 FY '01 FY '02 FY '03 FY '04 FY '05 FY '06
Linux Farm 0 0 50 33 0 50 90 70SMP Servers 0 0 40 0 40 0 60 40Disk Subsystem 0 0 39 60 0 60 130 85Local Area Network 0 0 15 15 15 15 15 15Desktops 0 0 8 8 8 8 8 8Hardware Maintenance 0 0 0 14 25 34 31 50Software Licenses 0 0 10 20 20 20 20 20Misc. 0 0 10 20 20 20 20 20
Total 0 0 172 170 128 207 374 308Overhead 12% 12% 12% 12% 12% 12% 12% 12%Total Cost 0 0 193 191 144 231 419 345
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
35
US ATLAS Facilities Annual Costs
(FY2000 k$)
FY '99 FY '00 FY '01 FY '02 FY '03 FY '04 FY '05 FY '06Tier 1
Equipment, etc. 110 220 560 590 910 1,650 2,450 2,130 Personnel 30 560 1,120 1,540 2,230 2,650 3,490 3,490
Tier 1 Total 150 780 1,670 2,130 3,150 4,310 5,950 5,620 Tier 2
Equipment, etc. - - 190 380 1,150 1,380 2,580 2,110 Personnel - - 140 420 1,120 1,680 1,680 1,680
Tier 2 Total - - 330 800 2,270 3,060 4,260 3,790 Network
Network Total - - 160 260 710 790 1,060 820 US ATLAS Facilities Total 150 800 2,200 3,200 6,100 8,200 11,300 10,200
19 January, 2000DOE/NSF Review of US LHC Software & Computing Projects
B. Gibbard
36
Major Milestones
Milestone Description Date
Selection of 1st Tier 2 site 01-Oct-00Procure Automate Tape Library (ALT) 01-Jun-01Demo Tier 2 transparent use of Tier 1 HSM 01-Jan-02Establish dedicated Tier 1 / CERN link 01-Jan-03Select remaining (4) Tier 2 sites 01-Jan-03Mock Data Challenge I (10% turn-on capcity) 01-May-03Final commit to HSM 01-Oct-03Mock Data Challenge II (33% turn-on capacity) 01-Jun-04Achieve turn-on capacities 01-Jan-05