Computing in the PHENIX experiment and our experience in Korea
Presented by H.J. Kim Yonsei University
For the behalf of HEP Data GRID Working Group
The Second International Workshop on HEP Data Grid
CHEP, KNU Aug. 22-23, 2003
RHIC
Configurations: Two concentric superconducting magnet rings (3.8Km circumference) with 6 interaction regions
Ion Beams: Au + Au (or d + A) s = 200 GeV/nucleon luminosity = 21026 cm-2 s-1
Polarized proton: p + p s = 500 GeV luminosity = 1.4 1031 cm-2 s-1
Experiments: PHENIX, STAR, PHOBOS, BRAHMS
PHENIX Experiment
4 spectrometer arms
12 Detector subsystems
350,000 detector channels
Event size typically 90Kb
Data rate 1.2 - 1.6KHz, 2KHz observed
100-130MB/s typical data rate
expected duty cycle ~50% -> 4TB/day
We store the data in STK Tape Silos with HPSS
Physics Goals
Search for Quark-Gluon Plasma Hard Scattering Processes Spin Physics
RCF
RHIC Computing Facility (RCF) provides computing facilities for four RHIC experiments (PHENIX, STAR, PHOBOS, BRAHMS).
Typically RCF gets ~ 30 MB/sec (or a few TB/day) from the PHENIX counting house only through Gigabit network. Thus RCF is required to have complicated data storage and data handling systems.
RCF has established an AFS cell for sharing files with remote institutions and NFS is the primary means through which data is made available to the users at the RCF.
The similar facility is established at RIKEN (CC-J) as a regional computing center for PHENIX.
Submit Grid Jobs
Internet
LSF Grid Cluster
LSFServer1
LSFServer2
GatekeeperJob manager
HPSS
Disks
Grid Job Requests
grid
Grid Configuration concept in RCF
620GB
30MB/sec
PHENIX Computing Environment
Linux OS with ROOT Framework PHOOL (PHenix Object Oriented Library) C++ class library
LocalDisks
Database
Reconstruction Farm
Calibrations&
Run Info
RawData
DSTData
BigDisk
HPSS
Mining&
Staging
AnalysisJobs
Tag DB
CountingHouse
PrettyBigDisk
RawData
PHENIX DAQ Room
A picture from the PHENIX Webcam
PHENIX Computing Resources
Locally in the countinghouse:
•~110 Linux machines of different ages ( 26 dual 2.4GHz , lots of 900-1600MHz machines), loosely based on RedHat 7.3
•Configuration,reinstallation, very low maintenance
•8TB local disk space
At the RHIC computing facility:
•40TB disk (PHENIX’s share)
•4 tape silos w/ 420 TB each (no danger of running out of space here)
•~320 fast dual CPU Linux machines (PHENIX’s share)
Run ‘03 highlights
•Replaced essentially all previous Solaris DAQ machines with Linux PC’s
•Went to Gigabit for all the core DAQ machines
•tripled the CPU power in the countinghouse to perform online calibrations and analysis
•boosted our connectivity to the HPSS storage system to 2x1000MBit/s (190MB/s observed)
•DATA taken : ~¼ PB /year
Raw data in Run-II: Au-Au; 100TB, d-AU; 100TB (100days), p-p; 10TB, p-p in Run-III; 35TB
Micro-DST(MDST) : 15% of Raw data
Au-Au, d-Au; 15TB, p-p ; 1.5TB (Run-II), 5TB(Run-III)
Phenix Upgrade plans
Locally in the countinghouse:
•go to Gigabit (almost) everywhere (replace ATM)
•use compression
•replace older machines, re-claim floor space, get another factor of 2 CPU and disks
•get el cheap commodity disk space
In general:
•Move to gcc 3.2, loosely some Redhat 8.x flavor
•we have already augmented Objectivity with convenience databases (Postgres, MySQL), which will play a larger role in the future.
•Network restructuring
Network data transfer at CHEP, KNU
● Real-time data transfer between CHEP(Daegu, Korea) and CCJ(RIKEN Japan) 2 Tbyte of data volume ( physics data ) has been transferred by one bbftp session and we experienced
maximum : 200 Gbyte/dayprobable : 100 Gbyte/day.
● 200 Gbyte/day between CCJ and RCF(BNL,USA) is known. Comparable speed is expected between CHEP and RCF.
Mass storage at KNU for Phenix
Mass storage ( HSM at KNU ) :
Efficient usage by PHENIX. 2.4 Tbyte assigned (for PHENIX) among total of 50 TB, 2 + some TB used (by PHENIX)
ExperienceReliable storage up to 2 TB. ( 5TB possible?)Optimised usage is under study.
Computing nodes at KT, DaeJeon
Computing nodes at KT, Deajeon : 10 nodes ( CPU : Intel XEON 2.0 GHz & memory : 2 Gbyte )Centralized management of the analysis softwares are possible by
nfs-mount of the CHEP disk to the cluster supported by the fast network between KT and CHEP.
CPU usages for the Phenix analysis > 50% always.
Experience : Network between CHEP and KT is satisfactory. No difference from the computing by the local cluster is seen.
CHEP17
Network
100 Gbyte/day
HSM
2.5 Tbyte
Script
Softwarelibrary
Data
NFS
KT cluster1
KT cluster2
KT cluster3
KT cluster9
KT cluster0
Yonsei Computing Resources for PHENIX
Linux (RedHat 7.3) ROOT Framework
Database
Reconstruction
Calibrations&
Run Info
RawData & DST
Big Disk(1.3 T byte)RAID tools for Linux
AnalysisJobs
Tag DB
Computers (P4)
100Mbps PHENIX Library
CHEP
RAID IDE Storage system R&D at Yonesei
● IDE RAID5 system with Linux
● 8 (1 for parity) x 180 Gbyte = 1.3 Tbyte storage
● Total Cost 3000$ ( Interface CARD + HD + PC)
● Write ~100Mbyte/sec and Read ~150Mbyte/sec
● So far no serious problem is experienced
Summary
* Computing facility of PHENIX experiment and current status is reported. It manages large data storage (1/4 PB/year) well. There is a movement to HEP GRID.
* We investigated parameters relevant to the HEP data GRID computing for PHENIX in CHEP 1. Network data transfer : Real-time data transfer speed of 100 GB/day between the major research facilities at CHEP (KOREA), CCJ (JAPAN), and RCF (USA) 2. Mass storage ( 2.5 TB, HSM at CHEP) 3. 10 Computing nodes ( at KT, Deajeon ) Data analysis is in progress (Run-II p-p MDST). The overall analysis experience has been satisfactory.
* IDE RAID storage R&D is on the way in Yonsei Univ.