+ All Categories
Home > Documents > [email protected] 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis...

[email protected] 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis...

Date post: 27-Mar-2015
Category:
Upload: aaron-crawford
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
52
[email protected] 1 Scientific Computing Scientific Computing Resources Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001
Transcript
Page 1: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 1

Scientific Computing Scientific Computing ResourcesResources

Scientific Computing Scientific Computing ResourcesResources

Ian Bird – Computer CenterHall A Analysis Workshop

December 11, 2001

Page 2: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 2

OverviewOverviewOverviewOverview

• Current Resources– Recent evolution– Mass storage – HW & SW– Farm– Remote data access– Staffing levels

• Future Plans– Expansion/upgrades of current resources– Other computing – LQCD– Grid Computing

• What is it? – Should you care?

Page 3: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 3

Gigabit EthernetSwitching Fabric

Gigabit EthernetSwitching Fabric

JLAB Network Backbone

JLAB Network Backbone

Batch Farm Cluster• 350 Linux nodes (400 MHz – 1 GHz)• 10,000 SPECint95• Managed by LSF + Java layer + web interface

Batch Farm Cluster• 350 Linux nodes (400 MHz – 1 GHz)• 10,000 SPECint95• Managed by LSF + Java layer + web interface

Interactive Analysis• 2 Sun 450 – 4 processor• 2 4-processor Intel/Linux

Interactive Analysis• 2 Sun 450 – 4 processor• 2 4-processor Intel/Linux

Lattice QCD Cluster• 40 Alpha/Linux (667 MHz)• 256 Pentium 4 (Q2 FY02?)• Managed by PBS + Web portal

Lattice QCD Cluster• 40 Alpha/Linux (667 MHz)• 256 Pentium 4 (Q2 FY02?)• Managed by PBS + Web portal

Unix, Linux, Windows desktops

bbftp service

Grid gateway

16 TB Cache disk SCSI + EIDE diskRAID 0 on Linux servers

16 TB Cache disk SCSI + EIDE diskRAID 0 on Linux servers

•2 STK silos•10 9940•10 9840•8 Redwood

•10 Solaris/Linux data movers w/ 300 GB stage

•2 STK silos•10 9940•10 9840•8 Redwood

•10 Solaris/Linux data movers w/ 300 GB stage

10 TB work areasSCSI disk – RAID 5

CUE General Services

JASMine managed Mass Storage Systems Internet

(ESNet : OC-3)

Jefferson LabScientific Computing

EnvironmentNovember 2001

Jefferson LabScientific Computing

EnvironmentNovember 2001

2 TB Farm CacheSCSI – RAID 0on Linux servers

2 TB Farm CacheSCSI – RAID 0on Linux servers

Page 4: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 4

Batch Farm – 350 processors175 – dual nodes each connected at 100 Mb to 24-port switch with Gb uplink (8 switches)

Batch Farm – 350 processors175 – dual nodes each connected at 100 Mb to 24-port switch with Gb uplink (8 switches)

Foundry BigIron 8000Switch; 256 Gb backplane,~45/60 Gb ports in use

Foundry BigIron 8000Switch; 256 Gb backplane,~45/60 Gb ports in use

Site Router –CUE and general services

Site Router –CUE and general services

•2 STK silos•10 9940•10 9840•8 Redwood

•10 Solaris/Linux data movers each w/ 300 GB stage & Gb uplink

•2 STK silos•10 9940•10 9840•8 Redwood

•10 Solaris/Linux data movers each w/ 300 GB stage & Gb uplink

CH-Router –Incoming data fromHalls A & C

CH-Router –Incoming data fromHalls A & C

Fiber Channeldirect from CLAS

Fiber Channeldirect from CLAS

Cache disk farm20 Linux servers –each with Gb uplinkTotal 16 TB SCSI/IDE – RAID 0

Cache disk farm20 Linux servers –each with Gb uplinkTotal 16 TB SCSI/IDE – RAID 0

Work disk farm4 Linux servers – each with Gb uplinkTotal 4 TB SCSI – RAID 5

Work disk farm4 Linux servers – each with Gb uplinkTotal 4 TB SCSI – RAID 5

Work disks4 MetaStor systems each with 100 Mb uplinkTotal 5 TB SCSI – RAID 5

Work disks4 MetaStor systems each with 100 Mb uplinkTotal 5 TB SCSI – RAID 5

JLAB Farm andMass Storage

SystemsNovember 2001

JLAB Farm andMass Storage

SystemsNovember 2001

Page 5: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 5

CPU ResourcesCPU ResourcesCPU ResourcesCPU Resources

• Farm– Upgraded this summer with 60 dual 1 GHz P III

(4 cpu / 1 u rackmount)– Retired original 10 dual 300 MHz– Now 350 cpu (400, 450, 500, 750, 1000 MHz)

• ~11,000 SPECint95

– Deliver > 500,000 SI95-hrs / week• Equivalent to 75 1 GHz cpu

• Interactive– Solaris: 2 E450 (4-proc)– Linux: 2 quad systems (4x450, 4x750MHz)– If required can use batch systems (via LSF) to

add interactive CPU to these (Linux) front ends

Page 6: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 6

First purchases, 9 duals per 24” rack

Last summer, 16 duals (2u) + 500 GB cache (8u) per 19” rack

Recently, 5 TB IDE cache disk (5 x 8u) per 19”

Intel Linux Farm

Page 7: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 7

Tape storageTape storageTape storageTape storage

• Added 2nd silo this summer– Required move of room of equipment– Added 10 9940 drives (5 as part of new silo)– Current:

• 8 Redwood, 10 9840, 10 9940– Redwood: 50 GB @ 10MB/s (helical scan single reel)– 9840: 20 GB @ 10MB/s (linear mid-load cassette (fast))– 9940: 60 GB @ 10MB/s (linear single reel)– 9840 & 9940 are very reliable– 9840 & 9940 have upgrade paths that use same media

» 9940 2nd generation – 100 GB@20MB/s ??• Add 10 more 9940 this FY (budget..?)• Replace Redwoods (reduce to 1-2)

– Requires copying 4500 tapes – started – budget for tape?

» Reliability, end of support(!)

Page 8: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 8

Disk storageDisk storageDisk storageDisk storage• Added cache space

– For frequently used silo files, to reduce tape accesses– Now have 22 cache servers

• 4 dedicated to farm ~ 2 TB• ~16 TB of cache space allocated to expts

– Some bought and owned by groups• Dual Linux systems, Gb network, ~ 1 TB disk, RAID 0

– 9 SCSI systems– 13 IDE systems

» Performance approx equivalent– Good match cpu:network throughput:disk space

– This is a model that will scale by a few factors, but probably not by 10 (but there is as yet no solution to that)

• Looking at distributed file systems for the future – to avoid NFS complications – GFS, etc., but no production level system yet.

– Nb. Accessing data with jcache does not need NFS, and is fault tolerant• Added work space

– Added 4 systems to reduce load on fs3,4,5,6 (orig /work)– Dual Linux systems, Gb network, ~ 1 TB disk, SCSI RAID 5– Performance on all systems is now good

• Problems –– Some issues with IBM 75 GB ATA drives, 3-ware IDE RAID cards, Linux

kernels• System is reasonably stable, but not yet perfect – but alternatives are not cost-

effective

Page 9: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 9

JASMineJASMineJASMineJASMine• JASMine – Mass Storage system software • Rationale – why write another MSS?

– Had been using OSM• Not scaleable, not supported, reached limit of sw, had to run 2 instances to

get sufficient drive capacity• Hidden from users by “Tapeserver”

– Java layer that» Hid complexities of OSM installations» Implemented tape disk buffers (stage)» Provided get, put, managed cache (read copies of archived data) capabilities

– Migration from OSM• Production environment….

– Timescales driven by experiment schedules, need to add drive capacity– Retain user interface

• Replace “osmcp” function – tape to disk, drive and library management– Choices investigated

• Enstore, Castor, (HPSS)– Timescales, support, adaptability (missing functionality/philosophy – cache/stage)

– Provide missing functions within Tapeserver environment, clean up and reworking

• JASMine (JLAB Asynchronous Storage Manager)

Page 10: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 10

ArchitectureArchitectureArchitectureArchitecture• JASMine

– Written in Java• For data movement, as fast as C code.• JDBC makes using and changing databases easy.

– Distributed Data Movers and Cache Managers– Scaleable to the foreseeable needs of the experiments– Provides scheduling –

• Optimizing file access requests• User and group (and location dependent) priorities

– Off-site cache or ftp servers for data exporting• JASMine Cache Software

– Stand-alone component – can act as a local or remote client, allows remote access to JASMine

– Can be deployed to a collaborator to manage small disk system and as basis for coordinated data management between sites

– Cache manager runs on each cache server.• Hardware is not an issue.• Need a JVM, network, and a disk to store files.

Page 11: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 11

Software cont.Software cont.Software cont.Software cont.

• MySQL database used by all servers.– Fast and reliable.– SQL

• Data Format– ANSI standard labels with extra information– Binary data– Support to read legacy OSM tapes

• cpio, no file labels

• Protocol for file transfers• Writes to cache are never NFS• Reads from cache may be NFS

Page 12: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 12

Dispatcher

CacheManager

DriveManager

DriveDisk

VolumeManager

Dispatcher

CacheManager

DriveManager

DriveDisk

VolumeManager

LibraryManager

Dispatcher

CacheManager

DriveManager

DriveDisk

VolumeManager

Client

RequestManager

Scheduler

Data Mover

LogManager

LibraryManager

Database

RequestManager

Database Connection

Service Connection

Log Connection

Page 13: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 13

JASMine ServicesJASMine ServicesJASMine ServicesJASMine Services

• Database– Stores metadata

• also presented to user on an NFS filesystem as “stubfiles”– But could equally be presented as e.g. a web service, LDAP, …

• Do not need to access stubfiles – just need to know filenames– Tracks status and locations of all requests, files, volumes,

drives, etc.• Request Manager

– Handles user requests and queries.• Scheduler

– Prioritizes user requests for tape access.• priority = share / (0.01 + (num_a * ACTIVE_WEIGHT) + (num_c * COMPLETED_WEIGHT) )

– Host vs User shares, farm priorities• Log Manager

– Writes out log and error files and databases.– Sends out notices for failures.

• Library Manager– Mount and dismounts tapes as well as other library related

tasks.

Page 14: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 14

JASMine Services -2JASMine Services -2JASMine Services -2JASMine Services -2

• Data Mover– Dispatcher

• Keeps track of available local resources and starts requests the local system can work on.

– Cache Manager• Manages a disk or disks for pre-staging data to and

from tape.• Sends and receives data to and from clients.

– Volume Manager• Manages tapes for availability.

– Drive Manager• Manages tape drives for usage.

Page 15: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 15

User AccessUser AccessUser AccessUser Access

• Jput– Put one or more files on tape

• Jget– Get one or more files from tape

• Jcache– Copies one or more files from tape to cache

• Jls– Get metadata for one or more files

• Jtstat– Status of the request queue

• Web interface– Query status and statistics for entire system

Page 16: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 16

Web interfaceWeb interfaceWeb interfaceWeb interface

Page 18: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 18

Data Access to cacheData Access to cacheData Access to cacheData Access to cache

• NFS– Directory of links points the way.– Mounted read-only by the farm.– Users can mount read-only on their desktop.

• Jcache– Java client.– Checks to see if files are on cache disks.– Will get/put files from/to cache disks.

• More efficient than NFS, avoids NFS hangs if server dies, etc., but users like NFS

Page 19: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 19

Disk Cache ManagementDisk Cache ManagementDisk Cache ManagementDisk Cache Management

• Disk Pools are divided into groups– Tape staging.– Experiments.– Pre-staging for the batch farm.

• Management policy set per group– Cache – LRU files removed as needed.– Stage – Reference counting.– Explicit – manual addition and deletion.– Policies are pluggable – easy to add

Page 20: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 20

Protocol for file movingProtocol for file movingProtocol for file movingProtocol for file moving

• Simple extensible protocol for file copies– Messages are java serialized objects passed

over streams,– Bulk data transfer uses raw data transfer over

tcp

• Protocol is synchronous – all calls block– Asynchrony & multiple requests by threading

• CRC32 checksums at every transfer• More fair than NFS• Session may make many connections

Page 21: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 21

Protocol for file movingProtocol for file movingProtocol for file movingProtocol for file moving

• Cache server extends the basic protocol– Add database hooks for cache– Add hooks for cache policies– Additional message types were added

• High throughput disk pool – Database shared by many servers– Any server in the pool can look up file location,

• But data transfer always direct between client and node holding file

– Adding servers and disk to pool increases throughput with no overhead,

• Provides fault tolerance

Page 22: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 22

Example: Get from cacheExample: Get from cacheExample: Get from cacheExample: Get from cache• cacheClient.getFile(“/foo”, “halla”);

– send locate request to any server– receive locate reply– contact appropriate server– initiate direct xfer– Returns true on success

cache4

Where is /foo?

Client (farm node)

cache1

cache2

cache3

Cache3 has /foo

Database

Get /fooSending

/foo

Page 23: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 23

Example: simple put to cacheExample: simple put to cacheExample: simple put to cacheExample: simple put to cache• putFile(“/quux”,”halla”,123456789);

Cache4 has room

Client(data mover)

cache1

cache2

cache3

cache4

Where can I put

/quux?

Database

Page 24: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 24

Fault ToleranceFault ToleranceFault ToleranceFault Tolerance

• Dead machines do not stop the system– Data Movers work independently

• Unfinished jobs will restart on another mover

– Cache Servers will only impact NFS clients• System recognizes dead server and will re-cache file

from tape• If users would not use NFS would never see a failure –

just extended access time

• Exception handling for– Received timeouts– Refused connections– Broken connections– Complete garbage on connections

Page 25: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 25

Authorization and Authorization and AuthenticationAuthentication

Authorization and Authorization and AuthenticationAuthentication

• Shared secret for each file transfer session– Session authorization by policy objects

– Example: receive 5 files from user@bar

• Plug-in authenticators– Establish shared secret between client

and server– No clear text passwords– Extend to be compatible with GSI

Page 26: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 26

JASMine Bulk Data TransfersJASMine Bulk Data TransfersJASMine Bulk Data TransfersJASMine Bulk Data Transfers

• Model supports parallel transfers– Many files at once, but not bbftp style

• But could replace stream class with a parallel stream

– For bulk data transfer over WANs

• Firewall issues– Client initiates all connections

Page 27: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 27

Architecture: Disk pool Architecture: Disk pool hardwarehardware

Architecture: Disk pool Architecture: Disk pool hardwarehardware

• SCSI Disk Servers– Dual Pentium III 650 (later 933)MHz CPUs– 512 Mbytes 100MHz SDRAM ECC– ASUS P2B-D Motherboard– NetGear GA620 Gigabit Ethernet PCI NIC– Mylex eXtremeRAID 1100, 32 MBytes cache– Seagate ST150176LW (Qty. 8) - 50 GBytes Ultra2 SCSI in Hot

Swap Disk Carriers– CalPC 8U Rack Mount Case with Redundant 400W Power Supplies

• IDE Disk Servers– Dual Pentium III 933MHz CPUs– 512 Mbytes 133MHz SDRAM ECC– Intel STL2 or ASUS CUR-DLS Motherboard– NetGear GA620 or Intel PRO/1000 T Server Gigabit Ethernet PCI

NIC– 3ware Escalade 6800– IBM DTLA-307075 (Qty. 12) - 75 GBytes Ultra ATA/100 in Hot

Swap Disk Carriers– CalPC 8U Rack Mount Case with Redundant 400W Power Supplies

Page 28: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 28

Cache PerformanceCache PerformanceCache PerformanceCache Performance

• Matches network, disk I/O, and CPU performance with size of disk pool:

– ~800 GB,– 2 x 850MHz– Gb Ethernet

Page 29: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 29

Cache statusCache statusCache statusCache status

Page 30: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 30

Performance – SCSI vs IDEPerformance – SCSI vs IDEPerformance – SCSI vs IDEPerformance – SCSI vs IDE

• Disk Array/File System – Ext2– SCSI Disk Server - 8 50 GByte disks in a RAID-0

stripe over 2 SCSI controllers• 68 MBytes/sec single disk write• 79 MBytes/sec burst for a single disk write• 52 MBytes/sec single disk read• 56 MBytes/sec burst for a single disk read

– IDE Disk Server - 6 75 GByte disks in a RAID-0 stripe

• 64 MBytes/sec single disk write• 77 MBytes/sec burst for a single disk write • 48 MBytes/sec single disk read • 49 MBytes/sec burst for a single disk read

Page 31: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 31

Performance NFS vs JcachePerformance NFS vs JcachePerformance NFS vs JcachePerformance NFS vs Jcache• NFS v2 udp - 16 clients,

• rsize=8192 and wsize=8192– Reads

• SCSI Disk Servers– 7700 NFS ops/sec and 80% cpu utilization– 11000 NFS ops/sec burst and 83% cpu utilization– 32 MBytes/sec and 83% cpu utilization

• IDE Disk Servers– 7700 NFS ops/sec and 72% cpu utilization– 11000 NFS ops/sec burst and 92% cpu utilization– 32 MBytes/sec and 72% cpu utilization

• Jcache - 16 clients– Reads

• SCSI Disk Servers– 32 MBytes/sec and 100% cpu utilization

• IDE Disk Servers– 32 MBytes/sec and 100% cpu utilization

Page 32: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 32

JASMine system performanceJASMine system performanceJASMine system performanceJASMine system performance• End-to-end performance

• i.e. tape load, copy to stage, network copy to client– Aggregate sustained performance of 50MB/s is

regularly observed in production– During stress tests, up to 120 MB/s was

sustained for several hours• A data mover with 2 drives can handle ~15MB/s (disk

contention is the limit)– Expect current system should handle 150MB/s and is

scaleable by adding data movers & drives– N.B. this is performance to a network client!

• Data handling– Currently the system regularly moves 2-3 TB

per day total• ~6000 files per day, ~2000 requests

Page 35: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 35

JASMine performanceJASMine performanceJASMine performanceJASMine performance

Page 36: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 36

Tape migrationTape migrationTape migrationTape migration

• Begin migration of 5000 Redwood tapes to 9940– Procedure written– Uses any/all available drives– Use staging to allow re-packing of tapes– Expect will last 9-12 months

Page 37: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 37

Batch Farm Cluster• 350 Linux nodes (400 MHz – 1 GHz)• 10,000 SPECint95• Managed by LSF + Java layer + web interface

Batch Farm Cluster• 350 Linux nodes (400 MHz – 1 GHz)• 10,000 SPECint95• Managed by LSF + Java layer + web interface

10 TB work areasSCSI disk – RAID 5

16 TB Cache disk SCSI + EIDE diskRAID 0 on Linux servers

16 TB Cache disk SCSI + EIDE diskRAID 0 on Linux servers

Typical Data FlowsTypical Data Flows

Raw Data< 10MB/s over

Gigabit Ethernet(Halls A & C)

Raw Data > 20 MB/s overFiber channel

(Hall B)

25-30 MB/s

25-30 MB/s

Page 38: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 38

How to make optimal use of the How to make optimal use of the resourcesresources

How to make optimal use of the How to make optimal use of the resourcesresources

• Plan ahead!• As a group:

– Organize data sets in advance (~week) and use the cache disks for their intended purpose

• Hold frequently used data to reduce tape access

– In a high data rate environment no other strategy works

• When running farm productions– Use jsub to submit many jobs in one command

– as it was designed• Optimizes tape accesses

– Gather output files together on work disks and make a single jput for a complete tape’s worth of data

Page 39: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 39

Remote data accessRemote data accessRemote data accessRemote data access• Tape copying is deprecated

– Expensive, time consuming (for you and us), and inefficient– We have OC-3 (155 Mbps) connection that is under-utilized,

filling it will get us upgraded to OC-12 (622 Mbps)• At the moment we do often have to coordinate with ESnet and

peers to ensure high-bandwidth path, but this is improving as Grid development continues

• Use network copies– Bbftp service

• Parallel, secure ftp – optimizes use of WAN bandwidth

• Future– Remote jcache

• Cache manager can be deployed remotely – demonstration Feb 02.

– Remote silo access, policy-based (unattended) data migration– GridFTP, bbftp, bbcp

• Parallel, secure ftp (or ftp-like)– As part of a Grid infrastructure

• PKI authentication mechanism

Page 40: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 40

(Data-) Grid Computing(Data-) Grid Computing(Data-) Grid Computing(Data-) Grid Computing

Page 41: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 41

Particle Physics Data GridParticle Physics Data GridCollaboratory PilotCollaboratory Pilot

Particle Physics Data GridParticle Physics Data GridCollaboratory PilotCollaboratory Pilot

Who we are:Four leading Grid Computer Science Projects

andSix international High Energy and Nuclear Physics Collaborations

What we do:Develop and deploy Grid Services for our Experiment Collaborators

andPromote and provide common Grid software and standards

The problem at hand today:Petabytes of storage, Teraops/s of computing

Thousands of users, Hundreds of institutions,

10+ years of analysis ahead

Page 42: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 42

PPDG ExperimentsPPDG ExperimentsPPDG ExperimentsPPDG Experiments

ATLAS - a Toroidal LHC ApparatuS at CERN Runs 2006 onGoals: TeV physics - the Higgs and the origin of mass …

http://atlasinfo.cern.ch/Atlas/Welcome.html

BaBar - at the Stanford Linear Accelerator Center Running

NowGoals: study CP violation and more

http://www.slac.stanford.edu/BFROOT/

CMS - the Compact Muon Solenoid detector at CERN Runs 2006

onGoals: TeV physics - the Higgs and the origin of mass …

http://cmsinfo.cern.ch/Welcome.html/

D0 – at the D0 colliding beam interaction region at Fermilab Runs SoonGoals: learn more about the top quark, supersymmetry, and the Higgs

http://www-d0.fnal.gov/

STAR - Solenoidal Tracker At RHIC at BNL Running

NowGoals: quark-gluon plasma …

http://www.star.bnl.gov/

Thomas Jefferson National Laboratory Running

NowGoals: understanding the nucleus using electron beams …

http://www.jlab.org/

Page 43: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 43

PPDG Computer Science GroupsPPDG Computer Science GroupsPPDG Computer Science GroupsPPDG Computer Science Groups

Condor – develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing on large collections of computing resources with distributed ownership.

http://www.cs.wisc.edu/condor/

Globus - developing fundamental technologies needed to build persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations

http://www.globus.org/

SDM - Scientific Data Management Research Group – optimized and standardized access to storage systems

http://gizmo.lbl.gov/DM.html

Storage Resource Broker - client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and cataloging/accessing replicated data sets.

http://www.npaci.edu/DICE/SRB/index.html

Page 44: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 44

Delivery of End-to-End ApplicationsDelivery of End-to-End Applications& Integrated Production Systems & Integrated Production Systems

Delivery of End-to-End ApplicationsDelivery of End-to-End Applications& Integrated Production Systems & Integrated Production Systems

to allow thousands of physicists to share data & computing resources for scientific processing and analyses

Operators & Users

Resources: Computers, Storage, Networks

PPDG Focus:

- Robust Data Replication

- Intelligent Job Placement and Scheduling

- Management of Storage Resources

- Monitoring and Information of Global Services

Relies on Grid infrastructure:- Security & Policy- High Speed Data Transfer- Network management

Page 45: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 45

Project Activities,Project Activities,End-to-End ApplicationsEnd-to-End Applications

and Cross-Cut Pilotsand Cross-Cut Pilots

Project Activities,Project Activities,End-to-End ApplicationsEnd-to-End Applications

and Cross-Cut Pilotsand Cross-Cut Pilots

Project Activities are focused Experiment – Computer Science Collaborative developments.

Replicated data sets for science analysis – BaBar, CMS, STARDistributed Monte Carlo production services – ATLAS, D0, CMSCommon storage management and interfaces – STAR, JLAB

End-to-End Applications used in Experiment data handling systems to give real-world requirements, testing and feedback.

Error reporting and responseFault tolerant integration of complex components

Cross-Cut Pilots for common services and policies Certificate Authority policy and authenticationFile transfer standards and protocolsResource Monitoring – networks, computers, storage.

Page 46: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 46

Year 0.5-1 Milestones (1)Year 0.5-1 Milestones (1)Year 0.5-1 Milestones (1)Year 0.5-1 Milestones (1)

Align milestones to Experiment data challenges:

– ATLAS – production distributed data service – 6/1/02

– BaBar – analysis across partitioned dataset storage – 5/1/02

– CMS – Distributed simulation production – 1/1/02

– D0 – distributed analyses across multiple workgroup clusters – 4/1/02

– STAR – automated dataset replication – 12/1/01

– JLAB – policy driven file migration – 2/1/02

Page 47: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 47

Year 0.5-1 MilestonesYear 0.5-1 MilestonesYear 0.5-1 MilestonesYear 0.5-1 Milestones

Common milestones with EDG:

GDMP – robust file replication layer – Joint Project with EDG Work Package (WP) 2 (Data Access)

Support of Project Month (PM) 9 WP6 TestBed Milestone. Will participate in integration fest at CERN - 10/1/01

Collaborate on PM21 design for WP2 - 1/1/02

Proposed WP8 Application tests using PM9 testbed – 3/1/02

Collaboration with GriPhyN:

SC2001 demos will use common resources, infrastructure and presentations – 11/16/01

Common, GriPhyN-led grid architecture

Joint work on monitoring proposed

Page 48: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 48

Year ~0.5-1 “Cross-cuts”Year ~0.5-1 “Cross-cuts”Year ~0.5-1 “Cross-cuts”Year ~0.5-1 “Cross-cuts”

• Grid File Replication Services used by >2 experiments:– GridFTP – production releases

• Integrate with D0-SAM, STAR replication• Interfaced through SRB for BaBar, JLAB• Layered use by GDMP for CMS, ATLAS

– SRB and Globus Replication Services• Include robustness features• Common catalog features and API

– GDMP/Data Access layer continues to be shared between EDG and PPDG.

• Distributed Job Scheduling and Management used by >1 experiment:

• Condor-G, DAGman, Grid-Scheduler for D0-SAM, CMS• Job specification language interfaces to distributed schedulers – D0-

SAM, CMS, JLAB

• Storage Resource Interface and Management• Consensus on API between EDG, SRM, and PPDG• Disk cache management integrated with data replication services

Page 49: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 49

Year ~1 other goals:Year ~1 other goals:Year ~1 other goals:Year ~1 other goals:

• Transatlantic Application Demonstrators:– BaBar data replication between SLAC and IN2P3– D0 Monte Carlo Job Execution between Fermilab and NIKHEF– CMS & ATLAS simulation production between Europe/US

• Certificate exchange and authorization.– DOE Science Grid as CA?

• Robust data replication.– fault tolerant – between heterogeneous storage resources.

• Monitoring Services– MDS2 (Metacomputing Directory Service)?– common framework– network, compute and storage information made available to scheduling and resource management.

Page 50: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 50

PPDG activities as part of the PPDG activities as part of the Global Grid CommunityGlobal Grid Community

PPDG activities as part of the PPDG activities as part of the Global Grid CommunityGlobal Grid Community

Coordination with other Grid Projects in our field:GriPhyN – Grid for Physics NetworkEuropean DataGridStorage Resource Management collaboratoryHENP Data Grid Coordination Committee

Participation in Experiment and Grid deployments in our field:ATLAS, BaBar, CMS, D0, Star, JLAB experiment data handling systemsiVDGL/DataTAG – International Virtual Data Grid LaboratoryUse DTF computational facilities?

Active in Standards Committees:Internet2 HENP Working Group Global Grid Forum

Page 51: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 51

Staffing LevelsStaffing LevelsStaffing LevelsStaffing Levels

• We are stretched thin– But compared with other labs with

similar data volumes we are efficient• Systems support group: 5 + 1 vacant• Farms, MSS development: 2• HW support/ Networks: 3.7• Telecom: 2.3• Security: 2• User services: 3• MIS, Database support: 8• Support for Engineering: 1

– We cannot do as much as we would like

Page 52: Ian.Bird@jlab.org 1 Scientific Computing Resources Ian Bird – Computer Center Hall A Analysis Workshop December 11, 2001.

[email protected] 52

Future (FY02)Future (FY02)Future (FY02)Future (FY02)• Removing Redwoods is a priority

– Copying tapes, replacing drives w/ 9940’s

• Modest farm upgrades – replace older CPU as budget allows– Improve interactive systems

• Add more /work, /cache• Grid developments:

– Visible as efficient WAN data replication services

• After FY02– Global filesystems – to supercede NFS– 10 Gb Ethernet– Disk vs. tape? Improved tape densities, data rates

• We welcome (coordinated) input as to what would be most useful for your physics needs


Recommended