Date post: | 08-Jun-2015 |
Category: |
Education |
Upload: | abhishek-de |
View: | 1,694 times |
Download: | 0 times |
04
/13
/20
23
1
High Energy Physics Data Management using CLOUD ComputingANALYSIS OF THE FAMOUS BABAR EXPERIMENT DATA HANDLING
PAPER BY: ABHISHEK DEY, CSE 2nd Year | DIYA GHOSH, CSE 2nd year | Mr. SOMENATH ROY CHOWDHURY
04/13/2023
2
Contents
Motivation
HEP Legacy Project
CANFAR Astronomical Research Facility
System Architecture
Operational Experience
Summary
04/13/2023
3
What exactly is BaBar?
It’s design was motivated by the investigation of CP violation.
set up to understand the disparity between the matter and antimatter content of the universe by measuring CP violation.
BaBar focuses on the study of CP violation in the B meson system.
nomenclature for the B meson (symbol B) and its antiparticle (symbol B, pronounced B bar)
04/13/2023
4
BaBar : Data Point of View
9.5 million lines of C++ and Fortran
Compiled size is 30 GB
Significant amount of manpower is required to maintain the software
Each installation must be validated before generated results will be accepted.
CANFAR is a partnership between :
– University of Victoria – University of British Columbia – National Research Council, Canadian Astronomy Data Centre – Herzberg Institute for Astrophysics
Helps in providing Infrastructure for VMs.
04/13/2023
5
Need for Cloud Computing:
Jobs are embarrassingly parallel, much like HEP.
Each of these surveys requires a different processing environment, which require:
A specific version of a Linux distribution.
A specific compiler version. Specific libraries
Applications have little documentation.
These environments are evolving rapidly
04/13/2023
6
DATA is precious, too precious..
We need Infrastructure, which comes easily as a Service
04/13/2023
7
A word about Cloud Computing:
04/13/2023
8
IaaS: What next?
With IaaS, we can easily create many instances of a VM image
How do we Manage the VMs once booted?
How do we get jobs to the VMs?
04/13/2023
9Our Solution: Cloud Scheduler + Condor
Users create a VM with their experiment software installed.
A basic VM is created by one group, and users add on their analysis or processing software to create their custom VM.
Users then create batch jobs as they would on a regular cluster, but they specify which VM should run their images.
CONDOR
04/13/2023
10Steps for the successful architecture setup:
04/13/2023
11
04/13/2023
12
04/13/2023
13
04/13/2023
14CANFAR : MAssive Compact Halo Objects
Detailed re-analysis of data from the MACHO experiment Dark Matter search.
Jobs perform a wget to retrieve the input data (40 M) and have a 4-6 hour run time. Low I/O great for clouds.
Astronomers happy with the environment.
04/13/2023
15
Data Handling in BaBar:
Analysis Jobs
Event data
Real Data
Simulated Data
ConfigurationBaBar
Conditions Database
Data is approximately 2PB.
The file system is hosted on a cluster of six nodes, consisting of a Management/Metadata server (MGS/MDS).
five Object Storage servers (OSS).
single gigabit interface/VLAN to communicate both internally and externally.
04/13/2023
16
Xrootd : Need for Distributed Data
Xrootd is a file server providing byte level access and is used by many high energy physics experiments.
provides access to the distributed data.
a read-ahead value of 1 MB
a read-ahead cache size of 10 MB was set on each Xrootd client
04/13/2023
17
How a DFS works?
Blocks replicated across several datanodes(usually 3)
Single namenode stores metadata (file names, block locations, etc.)
Optimized for large files, sequential reads Clients read from closest replica available.(note:
locality of reference.) If the replication for a block drops below target,
it is automatically re-replicated.
Datanodes
11223344
112244
221133
114433
332244
Namenode
04/13/2023
18
Results and Analysis:
04/13/2023
19
Fault tolerant model:
04/13/2023
20
Acknowledgements
A special word of appreciation and thanks to Mr. Somenath Roy Chowdhury.
My heartiest thanks to the entire team who worked hard to build the cloud.
04
/13
/20
23
21
Questions Please?