+ All Categories
Home > Documents > SRM 2.2 Issues Well, er, and 2.3 too Jens Jensen (STFC RAL/GridNet2) On behalf of GSM-WG OGF22,...

SRM 2.2 Issues Well, er, and 2.3 too Jens Jensen (STFC RAL/GridNet2) On behalf of GSM-WG OGF22,...

Date post: 03-Jan-2016
Category:
Upload: george-elliott
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
30
SRM 2.2 Issues Well, er, and 2.3 too Jens Jensen (STFC RAL/GridNet2) On behalf of GSM-WG OGF22, Cambridge, MA
Transcript

SRM 2.2 IssuesWell, er, and 2.3 too

Jens Jensen (STFC RAL/GridNet2)

On behalf of GSM-WG

OGF22, Cambridge, MA

This Talk

• Deviates from previous principles of being for beginners– Technical– Less polished…– May be useful for others…

• Expose standard and protocol process– Not many answers – kickstart(restart) process

• Combines the two sessions– Input (mainly) from dCache, CASTOR, StoRM

Aims

• Revisit specification– Implementations’ deviations from OGF specifications– Ensure another group can interoperate– If someone else were to start from scratch– E.g. SRB (ASGC work)

• Aim is not to start work on 2.3– I.e. the aim is not – not the aim is to not, not that aim

is not to start– If that makes sense

A Very Brief History

• Spec from 2006

• Then came implementations

• Then came WLCG

• …revisit spec

• Now getting experiences

• …revisit spec, highlight issues

• …think about next steps

Philosophies

• Manage diverse storage systems (but nothing else)

• User interface (not admin)• Open Standard

– A standard is not a standard until it is a standard (next slide)

• Open participation (no fees, no closed societies)• Protect storage from Grid?• Encourage best practices?• Encourage uniformity? Allow diversity?• The File is the unit of currency (not datasets)

Compare OASIS

• “Approved within an OASIS Committee,”• “Submitted for public review,”• “Implemented by at least three

organizations,”• “And finally ratified by the Consortium's

membership at-large.”

• We would add that the three implementations “must interoperate”!

WLCG

• Wide deployment

• “Now get experience” with WLCG

• MoU: Significant changes to spec…

• Do they make sense? Process.

• What about smaller customers?

• Tape1Disk1=ONLINE_AND_NEARLINE?– …No. In cache does not mean always in

cache

Space Tokens on Get

• srmPrepareToPut uses a space token (description)

• srmPrepareToGet doesn’t– Also for srmBringOnline

• Problem for many implementations– dCache, CASTOR– dCache: MSS doesn’t see space token– StoRM: not needed

Other get issues

• Getting directories?– Not supported?– Or special permissions required?– Also to apply for large bulk requests?

Finance Use Cases

• Ezio Corso (ICTP/E-Grid) (StoRM)– Compare EGEE industry liaison– “Complexity of financial instruments”– “more stringent risking and reporting

requirements”– “Point solution” grids inefficient (silo)– Big computing makes data bottleneck– Access control by individuals

Spaces

• Access Control on spaces– Also to be published in GLUE 1.3 schema as

ACBR on VOInfo

• Reserving subspaces of spaces

• Summarising spaces for Owner

• Query space status?

What is a Space Anyway?

• A collection at least one of physical storage component area?

• With a common baseline set of capabilities (access latency etc)?

• Not to even mention “free” space, “used” space, etc.– Tricky to define– Even more tricky to measure– Still more tricky to get agreement

What is a Space anyway?

• Is everything a space?– Suggestion to have toplevel static spaces

• Is disk a space? Or can space have disk?• Spaces can be named by token descrs

– Always named by space token descr?– Can be referenced by path? Non-uniquely?– Can be referenced (non-uniquely) by

capabilities?

• Is a (static) space an SA?

Space Behaviour

• What happens if a file is released?– Space given back to the Space?– Space does not re-grow?

• Permanent file in limited space?– Used to be: not permitted– Now, space is shrunk and released– Keep token around, or permit recycling?

Permissions

• Simple Unixy (POSIX) permissions• Default permissions on directories

– Inheritance from above?– Consistent with space permissions, if

applicable?– Default (per VO?)

• Permit for roles and groups?• Stage in permission (protect write cache)

– Not the same as reading

Permissions

• StoRM calls out to LFC– Access control API in SRM not adequate– Use LFC’s API

• Multiple StoRMs can share an LFC

• => Can synchronise between SE and LFC

Return Codes

• SRM_REQUEST_QUEUED

• SRM_REQUEST_INPROGRESS

• srmCopy()

Use of GSI authentication

• Currently using SOAP over GSI sockets• GSI needed for delegation• Delegation needed for srmCopy() (only)• Incompatible with SSL• Proposal to use gLite delegation

– SOAP API specifically for delegation– AstroGrid uses home-made REST-based

• Not using WS-Anything– Many are Java only, too complex, not mature

FileStorageType

• Volatile, Durable, Permanent

• Should have been:

• ReleaseWhenExpired, WarnWhenExpired, NeverExpire– Avoid confusion with overloaded term from

1.1 – wrongly named in spec.

• What is done on Durable/WarnWE timeout? (“raise error condition”)

Access Latency

• OFFLINE not defined

• Not used by WLCG

• But does that mean it doesn’t exist?

• ONLINE_AND_NEARLINE mentioned

• LOST…

• UNAVAILABLE…

Default

• Certain aspects of API optional– Standard default?– Or implementation-defined default?– E.g., “default” space

• Default filesize on put?– Is it 1?– Is it implementation dependent? Space

dependent?– Is it returned?

Implicit

• Implicit pinning• Implicit reservations• Implicit lifetimes• Implicit changes on

action: • Implicit changes on

expiry

• Surprising for users?• Complicates

implementations?• What if permission

denied for implicit action?

• What is reasonable?

Explicit but unknown

• Changing spaces (capabilities)– WLCG restricted D1T1 <-> D0T1 (more or

less)

Best Practices for Clients

• Propagate errors to user

• Clean up after yourself…– Even after unclean exit

• Should SRM use request timeout and keepalive?– Cancel at any point?– Or only when queueing

srmCopy

• Was always slightly tricky (also in 1.0 1.1)• Needs delegation (GSI problem)• How and when does client check status• What if remote host is not an SRM2?• Push modes and pull modes – and firewalls• And then the GridFTP modes (push/pull)• And the GridFTP streams• Can’t always get good results if implementation

uses defaults or tries to guess• No way to set most parameters

srmLs problem

• Classical problem with large directories

• Exercise: on a normal filesystem ls -R dir with large directories. While you wait, try to use the system.

• Large data volumes in SOAP– Attachment supported?

• Truncate, offset

Which bits are optional…?

• Many features

• Most parameters

• TExtraInfo

Next Steps

• Continue this process• Define terminology• Assess “damage”• 2.3

– No, not yet– Too soon, not enough

experience with 2.2– Adaption difficult

Options• Do nothing

– Too late (WLCG)

• Document differences• Retrofit things into 2.2• Add to 2.2

(incremental)• Postpone to “2.3”• Postpone to 3.1

Future Stuff

• WSRF– Rich Wellner (2004)– (WSRT?)

• Avoid duplication

• Compare OGSA-D-Arch– Proposes modular architecture for data

More Capabilities

• Integrity checking– Act when integrity checking fails?

• Service description, agreement (dynamic)• File content• Data sets, chunks• Dynamic resource allocation

– Networks, additional storage, disk servers (now known as virtualisation)

– Recovery


Recommended