Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

Post on 21-Jan-2016

25 views 0 download

description

Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory http://sdm.lbl.gov/srm. Participants. PI: Arie Shoshani LBNL – 2 FTEs: Arie Shoshani, PI - PowerPoint PPT Presentation

transcript

1

Storage Resource ManagementStorage Resource Management(SRM)(SRM)

For Grid ApplicationsFor Grid ApplicationsA SciDAC supportedA SciDAC supported

middleware componentmiddleware component

Arie ShoshaniArie ShoshaniComputing Sciences DirectorateComputing Sciences Directorate

Lawrence Berkeley National LaboratoryLawrence Berkeley National Laboratory

http://sdm.lbl.gov/srmhttp://sdm.lbl.gov/srm

2

ParticipantsParticipants

PI: Arie Shoshani LBNL – 2 FTEs:Arie Shoshani, PIAlex Sim, co-PIJunmin GuAndreas Mueller

Fermilab – ½ FTE: Don Petravick, Co-PIRich Wellner 

3

MotivationMotivation

• Grid architecture emphasized in the pastGrid architecture emphasized in the past• Security• Compute resource coordination & scheduling• Network resource coordination & scheduling (QOS)

• SRMs role in the data grid architectureSRMs role in the data grid architecture• Storage resource coordination & scheduling

• Types of storage resource managersTypes of storage resource managers• Disk Resource Manager (DRM)• Tape Resource Manager (TRM)• Hierarchical Resource Manager (TRM + DRM)

4

Where Do SRMs Fit Where Do SRMs Fit in Grid Architecture?in Grid Architecture?

tape system

HRM

RequestExecuter

DRM

DiskCache

property-file index

Replicacatalog

NetworkWeatherService

logicalquery

pinning & filetransfer requests

network

DRM

DiskCache

clientclient...

RequestInterpreter

requestplanning

logical files

site-specific files

Client’s site

...

DiskCache

site-specific files requests

5

Challenges (1)Challenges (1)

• Managing storage resources in an unreliable Managing storage resources in an unreliable distributed large heterogeneous systemdistributed large heterogeneous system

• Long lasting data intensive transactionsLong lasting data intensive transactions• Can’t afford to restart jobs• Can’t afford to loose data, especially from experiments

• Type of failuresType of failures• Storage system failures

• Mass Storage System (MSS)• Disk system

• Server failures• Network failures

6

Challenges (2)Challenges (2)

• HeterogeneityHeterogeneity• Operating systems (well understood)• MSS - HPSS, Castor, Enstore, …• Disk systems – system attached, network attached,

parallel

• Optimization issuesOptimization issues• avoid extra file transfers - What to keep in each disk

caches over time • How to maximize sharing for multiple users• Global optimization• Multi-Tier storage system optimization

7

Specific ProblemsSpecific Problems

• Managing resource space allocationManaging resource space allocation• What if there is no space?

• Managing pinning of filesManaging pinning of files• What if files can be removed in the middle of a transfer

• Space reservationsSpace reservations• What if multiple files are needed concurrently

• File streamingFile streaming• For processing a large set of files

• Pin-lockPin-lock• What if you pinned files, and system deadlocks

• User prioritiesUser priorities• Access control – who can read/write a fileAccess control – who can read/write a file

8

HRMs in PPDGHRMs in PPDG (high level view)(high level view)

tape systemDisk

Cache

tape systemDisk

Cache

HRM-COPY

HRM-GET

Replica Coordinator

HRM(performs writes)

HRM(performs reads)

• Monitors files written into BNL’s HPSS• Selects files to replicate• Issues request_to_put for file (or many files)

LBNL BNL

GridFTP GET (pull mode)

9

Details of InteractionsDetails of Interactions

LBNL-PDSF BNL

Client

HRM HRM

1. request toreplicate

2. file request

3. Stage the file

4. notify the caller

5. gridftp from BNLto PDSF

6. release the file

8. migrate the fileto HPSS

9. notify the client(file in HPSS)

7. notify the client(file in HRM)

10

MeasurementsMeasurements

20020103123100 20020103123200 20020103123300 20020103123400 20020103123500 20020103123600 20020103123700 20020103123800

time

pro

cess

set287_07_10evts_h_dst.xdf.STAR.DBset195_02_2evts_dst.xdf.STAR.DBset162_01_28evts_dst.xdf.STAR.DBset195_01_33evts_dst.xdf.STAR.DBset193_01_17evts_h_dst.xdf.STAR.DBset165_01_31evts_dst.xdf.STAR.DBset165_02_30evts_dst.xdf.STAR.DBset163_02_24evts_dst.xdf.STAR.DBset163_01_32evts_dst.xdf.STAR.DBset192_01_27evts_dst.xdf.STAR.DB

FILE_REQUEST_FAILED

Notified_Client

Migration_Finished

Migration_Requested

Transfered_to_PDSF_from_BNL

Staging_finished_at_BNL

Staging_started_at BNL

Staging_requested_at_BNL

File replication request start

11

SC 2001 Demo SetupSC 2001 Demo Setup

DRM

Disk Cache

Disk Cache

Disk Cache

Disk Cache

BerkeleyBerkeleyChicago Livermore

HRMGridFTPGridFTP GridFTPFTP

Disk Cache

BIT-MAPIndex

RequestManager

File TransferMonitoring

DRM GridFTP

Denver

client

server server server server

Logical Request

Data Path

Control path

Legend:

12

Monitoring File TransferMonitoring File Transfer

13

AccomplishmentAccomplishment

• Developed HRMs and DRMs using the same uniform Developed HRMs and DRMs using the same uniform protocolsprotocols

• Deployed in PPDGDeployed in PPDG

• Developed Command Line interface to HRMDeveloped Command Line interface to HRM

• Wrote a joint design specification in cordination with Wrote a joint design specification in cordination with EDG, Jlab, and Fermi (to be presented at GGF)EDG, Jlab, and Fermi (to be presented at GGF)

• Wrote a paper for MSS conferenceWrote a paper for MSS conference

• Future: develop a standard protocolFuture: develop a standard protocol

• Future: deploy HRM in ORNL & NERSC for ESG II projectFuture: deploy HRM in ORNL & NERSC for ESG II project

14

15

16