Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | calvin-carson |
View: | 214 times |
Download: | 0 times |
Data Handling at Fermilab and Plans for Worldwide Data Handling at Fermilab and Plans for Worldwide AnalysisAnalysis
Vicky White
Computing Division and D0 Experiment, Fermilab
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
2
OutlineOutline
(I) -- Solutions already implemented in use today - HEP expts, SloanDigitalSky Survey,Theorist
Lattice Guage Computation operational experience with the Mass Storage Component
(II) -- Solutions being implemented for Collider Run II with upgraded detectors (March 2001)
Building and testing data handling solutions for CDF and D0
(III) -- Moving onwards - to the future SDSS and NSF KDI SAN’s Particle Physics Data Grid Monarch and planning for CMS
(IV) -- Conclusions
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
3
(I) Solutions already implemented
(the Hierarchical Mass Storage Component of them )
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
4
The ‘Old’ central mass storage system The ‘Old’ central mass storage system
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
5
FMSS quota usage updated at Sat Feb 5 01:00:00 CST 2000
Group Exp FMSS Quota (KB) Used (KB)==========================================g022 g022 00000040000000.0 00000011057801.7 ktev ktev 00006000000000.0 00005615922181.0 sdss sdss 00002000000000.0 00001985392386.0 canopy canopy 00005200000000.0 00005033291488.0 mssg mssg 00001000000000.0 00000004800833.0 e781 e781 00003000000000.0 00002911541573.0 e831 e831 0002000000000.0 00001712487988.0 minos minos 00000250000000.0 00000023601888.0cosmos cosmos 00000100000000.0 00000000000000.0 e740 e740 00004000000000.0 00004637814378.0 cms cms 00000800000000.0 00000662489480.0 auger auger 00000150000000.0 00000080381914.0 btev btev 00000200000000.0 00000116132322.0 e791 e791 00000300000000.0 00000260439481.0 e866 e866 00000200000000.0 00000004801515.0 e815 e815 00000400000000.0 00000373635826.0 hppc hppc 00000914400000.0 00000170923724.0 e811 e811 00000050000000.0 00000019215485.0 e872 e872 00000075000000.0 00000064752717.0 theory theory 00000102400000.0 00000055218237.0 e665 e665 00000020480000.0 00000004737660.0
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
6
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
7
(II) Building and testing data handling solutions for CDF and D0
the Problemthe Solutions - what and how
dealing with a worldwide collaboration
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
8
Run II - Petabytes Storage and Data Run II - Petabytes Storage and Data Access problemAccess problem
Category Parameter D0 CDF
DAQ rates Peak rate 53 Hz 75 Hz
Avg ev. Size 250 KB 250 KB
Level 2 output 1000 Hz 300 Hz
Max can log Scalable 80 MB/sData storage # of events 600 M/year 900 M/year
RAW data 150 TB/year 250 TB/year
Reconstructeddata tier
75 TB/year 135 TB/year
Physicsanalysissummary tier
50 TB/year 79 TB/year
Micro summary 3TB/year -CPU Recons/event 1000-2500
MIPS.s /ev1200MIPS.s/ev
Reconstructio 34,000-83,000MIPS
56,000 MIPS
Analysis 60,000-80,000MIPS
90,000 MIPS
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
9
The CDF DetectorThe CDF Detector
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
10
Key Elements of Run II Data HandlingKey Elements of Run II Data Handling
What is our overall strategy for the DH system?
How do we physically organize the data?
On what do we store it - where?
How do we migrate between parts of the storage hierarchy ?
How do we provide intelligent and controlled access for large numbers of scientists?
… and track all the processing steps
How do we make it scalable, robust, available?
How do we work with the data at remote sites?
What are we learning for the next generation expts?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
11
Past and Present Strategies for data Past and Present Strategies for data processing/data handlingprocessing/data handling
Use ‘Commodity’ components where possible inexpensive CPUs in ‘farms’ for reconstruction processing
e.g. PCs inexpensive (if somewhat unreliable) tape drives and media
Multi-vendor IBM, SGI, DEC, SUN, Intel PCs
Use much Open Source Software (Linux,GNU, tcl/tk, python, apache,CORBA implementations…)
Hierarchy of active data stores Disk, Tape in Robot, Tape on Shelf
Careful placement and categorization of data on physical medium
optimize for future access patterns
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
12
Processing FarmProcessing Farm
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
13
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
14
Several Processing FarmsSeveral Processing Farms
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
15
D0 Data Access (Read and Write) D0 Data Access (Read and Write) AbstractionAbstraction
Online Data
AcquisitionComputers
Network Fabric(s)
Data Movers
Tape Robot(s)
Tape Shelves
Key factors:a) Organization of data on tape to match access b) Understanding and controlling access patternsc) Disk caches for most frequently accessed datad) Management of pass-through data disk buffers e) Rate-adapting disk buffers where necessary f) Scalability and robustnessg)Bookkeeping and more bookkeeping… h) Distributed client/servers ---> worldwide solns
Reconstruction Processing Farms of
Computers
Database Servers
Analysis Computers
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
16
Designing for 200MB/s in/out RobotDesigning for 200MB/s in/out Robot
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
17
D0 - the Real SystemD0 - the Real System
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
18
CDF Run II Data FlowCDF Run II Data Flow
Write ~ 50 datasets10 Mbytes/sec
75 Hz, 20 Mbytes/sec
FiberChannelconnection
Read Primary datasets
Write Secondarydatasets
150 Mbytes/sec
75 Hz, 20 Mbytes/sec
75 Hz, 20 Mbytes/sec
Read data 1.5 Gbytes/sec
Read RAWdata 20 Mbytes/sec
30 Terabytes
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
19
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
20
Key Elements of Run II Data HandlingKey Elements of Run II Data Handling
What is our overall strategy for the DH system?
How do we physically organize the data?
On what do we store it - where?
How do we migrate between parts of the storage hierarchy ?
How do we provide intelligent and controlled access for large numbers of scientists?
… and track all the processing steps
How do we make it scalable, robust, available?
How do we work with the data at remote sites?
What are we learning for the next generation expts?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
21
Run II Data Access - strategiesRun II Data Access - strategies
data content for an event from different processing stages stored in different physical collections
‘tiers’ of data of different sizes and content - RAW, fully reconstructed, summary reconstructed, highly condensed summary, ntuples and meta-data
primarily file-oriented access mechanisms fetch a whole collection of event data (i.e. 1 file ~ 1GB) read through and process it sequentially
optimize traversal of data & control access based on physics & user - not on file system
use relational databases (Oracle centrally ) for file and event catalogs and other ‘detector conditions’ and calibration data (0.5 - 1 TB)
import simulated data (files and tapes) from MC
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
22
Key Elements of Run II Data HandlingKey Elements of Run II Data Handling
What is our overall strategy for the DH system?
How do we physically organize the data?
On what do we store it - where?
How do we migrate between parts of the storage hierarchy ?
How do we provide intelligent and controlled access for large numbers of scientists?
… and track all the processing steps
How do we make it scalable, robust, available?
How do we work with the data at remote sites?
What are we learning for the next generation expts?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
23
Data Tiers for a single EventData Tiers for a single Event
RAW detector measurements
Reconstructed Data -Hits, Tracks, Clusters,Particles
Summary Physics Objects
Condensed summaryphysics data
Data Catalog entry
250KB
~350KB
50-100KB
5-15KB
~200B
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
24
Data Streams and Data TiersData Streams and Data Tiers
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
25
Streaming the Data - optimize for data Streaming the Data - optimize for data access traversalaccess traversal
Up-front physical data organization
and clustering
Multiple streams written and read
in parallel
Streams are physics based,
unlike disk striping
D0 approach to streaming the data
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
26
CDF Data StreamingCDF Data Streaming
also separate data into many physical streams
not ‘exclusive’ streams - data may be written to multiple physical streams
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
27
Access to Objects - OO designAccess to Objects - OO design
C++ Reconstruction and Analysis Programs Fully object oriented design - STL, Templates reference-counted pointers (D0) OO data model like OODBMS persistent objects inherit from persistent class
Objects and Collections of Objects stored persistently to disk and tape
‘flattened’ out to files in special HEP formats
d0om persistency package for D0 supports various external ‘flattened’ format, including relational
database allows for possibility of storing some ‘tiers’ of the data in OO
database if proven useful
ROOT (HEP analysis package) file format for CDF
Schema evolution can be tailored to need
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
28
Object Databases/Strategies and Object Databases/Strategies and ChoicesChoices
An OODB more or less adopts the following ideas:
Objects represents entities and concepts from the application domain. Their behavior is defined by their associated methods, which also can be stored in the DB, thus making them ‘universal’ and available
Hierarchies of classes inherit behavior - to avoid storage of redundant information and improve simplicity similar objects are grouped together
GOAL--- have full database capability available to any object which can be created in any (supported) language
DREAM -- minimum of work to store an object + DB provides query, security, integrity, backup, concurrency control, redundancy + has the performance of a hand-tuned object manager for your particular application
•The more I know about the data, the more likely and the faster it can be found
•The sooner I know what you want, the faster you will get it
•The less variety in the data you have, the more opportunities for optimization
•The less often you restructure the data, the less overhead in keeping track of it
•The more people, from more places who want access to the data, the tougher the problem of serving them
•The more often you want to ask the same questions, the easier it will be to optimize for those ‘queries’
•It will be much faster to “give you what you stored” than to find some new pattern contained in several “things” that you stored
•The more complicated the pattern you search for, the longer the search will take
The “Natural Laws” of data storage and retrieval
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
29
Object Lessons - Tom LoveObject Lessons - Tom Love
characterization and “Natural Laws” from the book
Object Lessons - Lessons Learned in Object Oriented Development Projects by Tom Love
“You can never achieve maximum performance with a system designed for maximum flexibility”
CDF and D0 both chose performance over flexibility- at least for the bulk of the data
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
30
Key Elements of Run II Data HandlingKey Elements of Run II Data Handling
What is our overall strategy for the DH system?
How do we physically organize the data?
On what do we store it - where?
How do we migrate between parts of the storage hierarchy ?
How do we provide intelligent and controlled access for large numbers of scientists?
… and track all the processing steps
How do we make it scalable, robust, available?
How do we work with the data at remote sites?
What are we learning for the next generation expts?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
31
Serial Media Working Group ReportSerial Media Working Group Report
Technology $/drive $/media Size $/GB MB/sstr/ran
RedwoodStor.Tek
80k 78 46.6 1.67 10/4
DD-2Ampex
72k 72 46.6 1.54 14/8
3590IBM
30k 54 10 5.4 8/5
DTFSony
30k 80 42 1.9 12/6
EliantExabyte
1.7k 5.4 6.5 0.83 1/0.9
DLT7000Quantum
5.5k 80 32.6 2.45 5/2
EXB-8900Exabyte
3.5k 72 20 3.6 3/2
AIT-1Sony
3.0k 72 25 2.88 3/2.7
Conclusions for Run II : decide in 1999, maintain options and flexibility, purchase multi-drive capable robot => Grau/EMASS
2000
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
32
EMASS AML2 Robot - flexible media - EMASS AML2 Robot - flexible media - up to 5000 cartridges per towerup to 5000 cartridges per tower
One for each - CDF and D0
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
33
Key Elements of Run II Data HandlingKey Elements of Run II Data Handling
What is our overall strategy for the DH system?
How do we physically organize the data?
On what do we store it - where?
How do we migrate between parts of the storage hierarchy ?
How do we provide intelligent and controlled access for large numbers of scientists?
… and track all the processing steps
How do we make it scalable, robust, available?
How do we work with the data at remote sites?
What are we learning for the next generation expts?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
34
Storage Management Software Storage Management Software Requirements Requirements
Robot Tape Library is not an archive but rather an active store. We therefore need to:
control placement of data on tapes write RAW data from DAQ reliably and with absolute priority exchange tapes frequently between robot and shelf use open tape format and provide packages to read/write tapes mark files and groups of files as read only control robot arm and tape bandwidth according to access mode,
project, user, etc. Keep the system up 24x7. access files from many different vendor machines, including
PC/linux, without software licensing issues
Unable to assure ourselves that necessary HPSS modifications and enhancements would be available for Fall ‘99, Fermilab decided to build a more agile and flexible system modeled on that of DESY.
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
35
ENSTORE storage management for ENSTORE storage management for Run IIRun II
Clientencp [options] < source> <destination>
MediaENSTORE replacement for OSM
Data Path
controlMover
ftt
Enstore Servers
PNFS Server Host(from DESY)
/pnfsadmin
usr
Perfectly Normal File System
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
36
ENSTORE system in operation todayENSTORE system in operation today
Data Catalog support for 1 million files and 16,000 volumes tested
Integrated Fermilab written tape I/o package - ftt - for tape handling - supports error handling and statistics
Scalability looks good -- achieved 20MB/sec into Origin 2000, using just one Gbit Ethernet, also into Farms. Would have produced graphs of 50MB/sec with 3 Gbit Ethernets if Cisco switch had not broken
Working on robustness -- mainly of the hardware
Because of their overall strategy for tape and disk - planning for Storage Area Networks for disk, and preferring directly connected, separate, tape drives for their Farms and Central Analysis Server - CDF do not use Enstore.
CDF has built its own tape staging package built on mt_tools and the same underlying ftt tape I/o package
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
37
GB/day - 3494 Robot + HPSSGB/day - 3494 Robot + HPSS
Fermilab Central Mass Storage System Utilization Gigabytes Transferred March 16th - August 24th, 1998
0.0
250.0
500.0
750.0
1000.0
1250.0
1500.0
03/1
6/98
03/2
3/98
03/3
0/98
04/0
6/98
04/1
3/98
04/2
0/98
04/2
7/98
05/0
4/98
05/1
1/98
05/1
8/98
05/2
5/98
06/0
1/98
06/0
8/98
06/1
5/98
06/2
2/98
06/2
9/98
07/0
6/98
07/1
3/98
07/2
0/98
07/2
7/98
08/0
3/98
08/1
0/98
08/1
7/98
08/2
4/98
GB
/da
y
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
MB
/se
c
Write
Read
I/O Rate
Average is 3 MB/sec
Max. sustained 23 MB/sec
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
38
Some recent Enstore statistics from Some recent Enstore statistics from the webthe web
http://www-d0en.fnal.gov/enstore
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
39
All the Enstore mover nodes kept busy All the Enstore mover nodes kept busy by the D0 SAM data access systemby the D0 SAM data access system
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
40
CDF disk to tape CDF disk to tape
We have invented a “poor man’s” SAN for read-only disk in a heterogeneous environment
Suitable for static datasets that change infrequently
Use the ISO-9660 file system used by CD-ROMs.
We have verified that the UNIX systems of interest (SGI,SUN) are able to format a disk using the ISO-9660 format, put data on it and read the data from multiple systems
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
41
Key Elements of Run II Data HandlingKey Elements of Run II Data Handling
What is our overall strategy for the DH system?
How do we physically organize the data?
On what do we store it - where?
How do we migrate between parts of the storage hierarchy ?
How do we provide intelligent and controlled access for large numbers of scientists?
… and track all the processing steps
How do we make it scalable, robust, available?
How do we work with the data at remote sites?
What are we learning for the next generation expts?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
42
Experiment Data Access Software Experiment Data Access Software
Define collection ofdata to be processed
Specify by Data Tier, Data Stream, Triggers, Run Ranges, Specific Files or Event, etc...
Resolve to Listof Files Use Oracle Relational Database Query Engine
Intelligent movementof data
Optimize Traversal of DataRegulate Use of Robot for different purposes and access modes
Implement Disk Cache retention policies
•SAM (Sequential Access Model) System for D0•CDF’s smallest unit is a Fileset and they use only this to optimizetape access and minimize robot arm use
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
43
D0 SAM - CORBA based frameworkD0 SAM - CORBA based framework
Networked Clients
Servers
ENSTORE- Robot,Tape Drives and
Movers
Global Optimizer for Robot File Fetching and
Regulator of Robot/Tape Access according to Access Pattern
“Stations” -- logical or physical grouping of resources
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
44
SAM in useSAM in use
simple command line interface (+ some GUIs and web browsers) e.g.
sam define project --defname=myproject -- … sam store --filename=xxxx --descrip=metadata-file ….
transparently integrated into D0 framework and d0om file name expanders
one consumer can have many processes all helping ‘consume’ delivered files - - supports Farm production processing without additional bookkeeping
distributed disk caches and various ‘physics group driven’ caching policies
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
45
1000+ Monte Carlo Files stored using 1000+ Monte Carlo Files stored using SAM - reading them backSAM - reading them back
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
46
CDF Data Access - Stagers and Disk CDF Data Access - Stagers and Disk Inventory ManagerInventory Manager
Resource Management using Batch system and static number of tape drives
File Caching
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
47
Key Elements of Run II Data HandlingKey Elements of Run II Data Handling
What is our overall strategy for the DH system?
How do we physically organize the data?
On what do we store it - where?
How do we migrate between parts of the storage hierarchy ?
How do we provide intelligent and controlled access for large numbers of scientists?
… and track all the processing steps
How do we make it scalable, robust, available?
How do we work with the data at remote sites?
What are we learning for the next generation expts?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
48
Example File and Event Catalog for Example File and Event Catalog for Run IIRun II
Oracle 8 database ==> 0.5 - 1 TB for D0, including detector run conditions and calibration data
1.8 10**9 Event metadata entries, bit indexes, own data types several million file entries
Oracle Network sitewide licence - now on Linux too
SAM system using a CORBA interface between components, including to database servers
CDF user processes consult directly with database
Data Files Catalogued and related to Runs and Run conditions Luminosity information about the accelerator The processes which produced (and consumed the data) Detector geometry, alignment and calibration data
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
49
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
50
Persistent Data for all the behaviors of Persistent Data for all the behaviors of the system and the data itselfthe system and the data itself
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
51
Key Elements of Run II Data HandlingKey Elements of Run II Data Handling
What is our overall strategy for the DH system?
How do we physically organize the data?
On what do we store it - where?
How do we migrate between parts of the storage hierarchy ?
How do we provide intelligent and controlled access for large numbers of scientists?
… and track all the processing steps
How do we make it scalable, robust, available?
How do we work with the data at remote sites?
What are we learning for the next generation expts?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
52
Scalability, Robustness, AvailabilityScalability, Robustness, Availability
SAM starting now to do serious stress testing for high throughput high availability and good error handling
Database used to store context for recovery in SAM
We are learning!
“Oracle 24X7 - Real World Approaches to Ensuring Database Availability”
-- need to start to think like this
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
53
Key Elements of Run II Data HandlingKey Elements of Run II Data Handling
What is our overall strategy for the DH system?
How do we physically organize the data?
On what do we store it - where?
How do we migrate between parts of the storage hierarchy ?
How do we provide intelligent and controlled access for large numbers of scientists?
… and track all the processing steps
How do we make it scalable, robust, available?
How do we work with the data at remote sites?
What are we learning for the next generation expts?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
54
Data
Enstore
SAM
Access(sync)
Projectrequest
File Request
Data Transfer
Volumeinfo
SAM MetadataExport
SAM Metadata
File Import
Enstore Metadata
Tape Import
Enstore Metadata
Tape Export
Data From Remote sites -- IN2P3, Data From Remote sites -- IN2P3, Nikhef, Prague, Texas…. Nikhef, Prague, Texas….
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
55
Access from Remote SitesAccess from Remote Sites
SAM designed for distributed caching system - File can have multiple locations in the database can use central Fermilab database -- or extracts in local
Linux Oracle Server
CDF expects to have local versions of their DH system running at non-Fermilab institutions
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
56
Key Elements of Run II Data HandlingKey Elements of Run II Data Handling
What is our overall strategy for the DH system?
How do we physically organize the data?
On what do we store it - where?
How do we migrate between parts of the storage hierarchy ?
How do we provide intelligent and controlled access for large numbers of scientists?
… and track all the processing steps
How do we make it scalable, robust, available?
How do we work with the data at remote sites?
What are we learning for the next generation expts?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
57
Lessons for future experiments? Lessons for future experiments?
Draw your own conclusions so far
We will tell you next year!
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
58
(III) Moving onwards - to the future
SDSS and NSF/KDI proposalStorage Area Networks?
Particle Physics Data GridCMS and Worldwide CollaborationNext generation Storage Systems?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
59
A project run by the Astrophysical Research Consortium (ARC)A project run by the Astrophysical Research Consortium (ARC)
Goal: To create a detailed multicolor map of the Northern Skyover 5 years, with a budget of approximately $80M
Data Size: 40 TB raw, 1 TB processed
Goal: To create a detailed multicolor map of the Northern Skyover 5 years, with a budget of approximately $80M
Data Size: 40 TB raw, 1 TB processed
The University of Chicago Princeton University The Johns Hopkins University The University of Washington Fermi National Accelerator Laboratory US Naval Observatory The Japanese Participation Group The Institute for Advanced Study Max Planck Inst, Heidelberg
SLOAN Foundation, NSF, DOE, NASA
The University of Chicago Princeton University The Johns Hopkins University The University of Washington Fermi National Accelerator Laboratory US Naval Observatory The Japanese Participation Group The Institute for Advanced Study Max Planck Inst, Heidelberg
SLOAN Foundation, NSF, DOE, NASA
The Sloan Digital Sky SurveyThe Sloan Digital Sky Survey
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
60
SDSS Data FlowSDSS Data Flow
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
61
Geometric IndexingGeometric Indexing
“Divide and Conquer”“Divide and Conquer” PartitioningPartitioning
3 N M3 N M
HierarchicalTriangular
Mesh
HierarchicalTriangular
Mesh
Split as k-d treeStored as r-tree
of bounding boxes
Split as k-d treeStored as r-tree
of bounding boxes
Using regularindexing
techniques
Using regularindexing
techniques
Attributes Number
Sky Position 3Multiband Fluxes N = 5+Other M= 100+
Attributes Number
Sky Position 3Multiband Fluxes N = 5+Other M= 100+
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
62
All raw data saved in a tape vault at Fermilab
Object catalog 400 GB parameters of >108 objects
Redshift Catalog 1 GB parameters of 106 objects
Atlas Images 1.5 TB 5 color cutouts of >108 objects
Spectra 60 GB in a one-dimensional form
Derived Catalogs 20 GB - clusters - QSO absorption lines
4x4 Pixel All-Sky Map 60 GB heavily compressed
Object catalog 400 GB parameters of >108 objects
Redshift Catalog 1 GB parameters of 106 objects
Atlas Images 1.5 TB 5 color cutouts of >108 objects
Spectra 60 GB in a one-dimensional form
Derived Catalogs 20 GB - clusters - QSO absorption lines
4x4 Pixel All-Sky Map 60 GB heavily compressed
SDSS Data ProductsSDSS Data Products
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
63
SDSS Distributed CollaborationSDSS Distributed Collaboration
JapanJapan
FermilabFermilab
U.WashingtonU.WashingtonU.ChicagoU.Chicago
USNOUSNO
JHUJHU
VBNS
NMSUNMSUApache PointObservatory
Apache PointObservatory
I. AdvancedStudy
I. AdvancedStudy
Princeton U.Princeton U.
ESNETESNET
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
64
NSF/KDI -- Analysis Data GridNSF/KDI -- Analysis Data Grid
Collaboration with the Analysis Data Grid:proposal to the NSF KDI program by
JHU, Fermilab and Caltech (H. Newman, J. Bunn) +
Objectivity, Intel and Microsoft (Jim Gray)
Involves computer scientists, astronomers and particle physicists
Accessing Large Distributed Archives in Astronomy and Particle Physics
experiment with scalable server architectures,
create middleware of intelligent query agents,
apply to both particle physics and astrophysics data sets
Status:3 year proposal just funded
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
65
http://grid.fnal.gov/ppdg
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
66
Bulk Transfer Service:100 Mbytes/s, 100 Tbytes/year
Primary Site
Data Acquisition,CPU, Disk,Tape-Robot
Replica Site(Partial)
CPU, Disk,Tape-Robot
High-Speed Site-to-Site File Replication Service
Primary SiteData Acquisition,
CPU, Disk,Tape-Robot
Satellite Site
CPU, Disk,Tape-Robot
Satellite Site
CPU, Disk,Tape-Robot
UniversityCPU, Disk,
Users
UniversityCPU, Disk,
Users
UniversityCPU, Disk,
Users
Satellite Site
CPU, Disk,Tape-Robot
Multi-Site Cached File Access
Initial Testbed ApplicationsInitial Testbed Applications
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
67
MREN
FermilabFermilab
MRENDisk
Cache
MetaData Catalog- SAM:File Location
StatisticsEngine Status
MetaData Catalog- SAM:File Location
StatisticsEngine Status
HPSStape library
HPSStape library
Operator Mounted Exabyte Tapes
Indiana U.Indiana U.
Bulk file transfer testbed --FocuscopyBulk file transfer testbed --Focuscopy
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
68
Bulk file transfer testbed -- awaits Bulk file transfer testbed -- awaits ESNet research network and QOSESNet research network and QOS
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
69
Matchmaking: Distributed Resource Management for High Throughput Computing, Proceedings of the Seventh IEEE International
Symposium on High Performance Distributed Computing, July 28-31, 1998, Chicago, IL.
Distributed Cache - combining SAM Distributed Cache - combining SAM and Condorand Condor
next project --- Objectivity database caching with Caltech and ANL?
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
70
Storage Area Networks (SANs) where are Storage Area Networks (SANs) where are we? we?
Heterogeneous cluster of machines - not locked to one vendor, competitive bids for computing.
Requires high bandwidth access to shared disk storage to work effectively - NFS and AFS not sufficiently high performance.
Use Fiber Channel as the physical layer and run SCSI over it
Unfortunately read/write to Fiber Channel disks in a heterogeneous environment is not currently available at an affordable cost
Proposal from Quantum Research -- unfunded Proposal from Quantum Research -- unfunded
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
71
1st phase of research was quite 1st phase of research was quite successfulsuccessful
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
72
LHC Experiment and other future LHC Experiment and other future experiment Data Access Architecturesexperiment Data Access Architectures
Scale and complexity numbers of channels of detectors number of participants and geographic dispersion complexity of collisions
Network bandwidth hopes distributed store of data, rather than data replication
Hierarchical Storage Systems evolution HPSS collaboration - Fermilab continues involvement CERN/DESY/Fermilab/Eurostore?
Disk availability/price all data on disk? => random access to sub-parts of event with
less attention to clustering of data on physical medium.
Object oriented database technology find the right places for it
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
73
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
74
CPU Power ~100 KSI95 Disk space ~100 TB Tape capacity 300 TB, 100 MB/sec Link speed to Tier2 10 MB/sec (1/2 of 155 Mbps) Raw data 1% 10-15 TB/year ESD data 100% 100-150 TB/year Selected ESD 25% 5 TB/year [*] Revised ESD 25% 10 TB/year [*] AOD data 100% 2 TB/year [**] Revised AOD 100% 4 TB/year [**] TAG/DPD 100% 200 GB/year
Simulated data 25% 25 TB/year
(repository)[*] Covering Five Analysis Groups; each selecting ~1%
of Total ESD or AOD data for a Typical Analysis
[**] Covering All Analysis Groups
Monarc Analysis Model Baseline: Monarc Analysis Model Baseline: ATLAS or CMS “Typical” Tier1 RCATLAS or CMS “Typical” Tier1 RC
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
75
MONARC Testbed SystemsMONARC Testbed Systems
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
76
Regional Center ArchitectureRegional Center ArchitectureExample by I. GainesExample by I. Gaines
Tapes
Network from CERN
Networkfrom Tier 2& simulation centers
Tape Mass Storage & Disk Servers
Database Servers
PhysicsSoftware
Development
R&D Systemsand Testbeds
Info serversCode servers
Web ServersTelepresence
Servers
TrainingConsultingHelp Desk
ProductionReconstruction
Raw/Sim ESD
Scheduled, predictable
experiment/physics groups
ProductionAnalysis
ESD AODAOD DPD
Scheduled
Physics groups
Individual Analysis
AOD DPDand plots
Chaotic
PhysicistsDesktops
Tier 2
Local institutes
CERN
Tapes
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
77
UF Equipment Plan for FY00UF Equipment Plan for FY00
R&D and User support, UF Hardware: disk storage: 1 TByte
large disk pool for ODBMS testbeds and data analysis (Monte Carlo and test beam data)
tape storage: up to 10 TByte provide several TB storage for MC and test beam data setup ODBMS testbed start using Objectivity + mass storage system in analysis provide data import and export facility
CPU resources: 30 node Linux cluster increase main server. Form production unit for MC production. PC analysis cluster and dedicated special purpose R&D
systems Network infrastructure
provide sufficient LAN capacity provide WAN connectivity for production and testbed activities
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
78
ConclusionsConclusions
If you have a lot of data to manage and access today you must
think carefully about how you store it, how you wish to access it, and how you will control access to it
be aware of media costs design a system for robustness and uptime (especially if
you use relatively inexpensive tape media) design a system for active and managed access to all
hierarchies of storage - disk, tape in robot & tape on shelf
For the next generation of experiments we hope for better network bandwidth and a truly distributed
system we investigate OO databases for their potential to provide
random access to sub-parts of event data
CHEP2000Vicky White
Data Handling at Fermilab and Plans for Worldwide Analysis
79
THE ENDTHE END