GridPP3 Storage Perspective, Achievements, Challenges Jens Jensen, STFC RAL GridPP20 TCD Dublin,...

Post on 28-Mar-2015

222 views 6 download

Tags:

transcript

GridPP3 StoragePerspective, Achievements, Challenges

Jens Jensen, STFC RAL

GridPP20

TCD Dublin, 11-12 March 2008

Jens Jensen, STFC/RAL

“Bear with me for a moment”

• View of the past– Achievements– Lessons learned

• Present– SRM 2 deployment

• Future– Todo– Really high level stuff

Jens Jensen, STFC/RAL

Who we are…

• GridPP storage community• As defined by mailing list, has ~55

members– Covers every UK site– Also in .ie, .nl, .ca, .pl, .it, .de

• However, not all are equally active…– But that’s OK– Isn’t it?

Jens Jensen, STFC/RAL

Support

Develo

pers

Dev su

pp

ort

Dep

l. sup

port

Grid

PP su

pp

rot

com

mu

nity

sup

pro

t

(loca

l)

use

rs

Jens Jensen, STFC/RAL

Support

Develo

pers

Dev su

pp

ort

Dep

l. sup

port

Grid

PP su

pp

rot

com

mu

nity

sup

pro

t

(loca

l)

use

rs

1 person…

Jens Jensen, STFC/RAL

Support

Develo

pers

Dev su

pp

ort

Dep

l. sup

port

Grid

PP su

pp

rot

com

mu

nity

sup

pro

t

use

rs

Maybe reality is a little more complicated

Jens Jensen, STFC/RAL

Your name appeared among the beneficiaries who will receive a part-payment of US$2.8 million and has been approved already for months. You are requested to get back to me for more direction and instruction on how to receive your fund. We want to hear from you before we can make the transfer

• Open for questions, goes to Greig and Jens

• Almost all spam• Promising to solve our financial problems

• They tell us: “Storage, size matters”

storage@gridpp.ac.uk

Jens Jensen, STFC/RAL

Status

Jens Jensen, STFC/RAL

Status

Jens Jensen, STFC/RAL

Status

• 2/3 of sites running DPM– Experimentally on Lustre– (Cambridge, UCL)

• 1/3 of sites running dCache• Tier 1 running CASTOR

– (and dCache)

• Bristol (Jon) running StoRM

Jens Jensen, STFC/RAL

Status

• Finished CCRC 08• Should have SRM2 deployed

– At least for Atlas (sites)• Need space token descrs• Problems with space manager in dCache

– And CMS (sites)• More static token descrs initially

– Information system secondary (tokens static)• Still req’d for accounting

• Many people worked hard to make it a success

Jens Jensen, STFC/RAL

Experiences

• Went well, mostly• SRM2 used at RAL

– Few odd bugs and issues

– E.g. “-0.00P” free– Negative file sizes

(gridftp 32 bit issue?)

• Took time to get space token (descr) agreed

• Who speaks for expts?

• Using spaces at T2s– OK for DPMers

• Needs firewall open• Endpoint published• Spaces set up

– Harder for dCache• Problems with space

mgr• But running on same

port

Jens Jensen, STFC/RAL

Lessons• No way to get through to everyone

– Needs some effort at sites (to do what we need)– Workshop at NeSC was a success

• Storage is more difficult than you'd think– Particularly the occasional peaks– Implementation specific optimisations– Locating the problem – complex implementations

• Need to manage risks more carefully– GridPP2: surprising number of risks happened!

Jens Jensen, STFC/RAL

risksRisks...(dating back to Dec06-Feb07, needs revision)

Jens Jensen, STFC/RAL

Special Achievements

• Beyond the call of duty• Recognised internationally• Or special benefits to users

Jens Jensen, STFC/RAL

Information Systems

Information collected globally

Used for

accounting

Users locate

resources

Jens Jensen, STFC/RAL

Information Systems

• Much work done on information system backends in GridPP– GIP plugin easier– DPM (Graeme, then Greig)– dCache debug (owned by SARA then DESY)– CASTOR

• Disk servers – Tier 1• CASTOR, LSF, tape robot – RAL Storage• Oracle databases – RAL DB group

Jens Jensen, STFC/RAL

Special Achievements

• Accounting– Space “available” and “used”– Resource overview and selection– (or non-selection)

• Numerous subtle issues with space• What is used? Available?• Can info be relied on for selection?• Subtle implementation issues• Long propeller head discussions

Jens Jensen, STFC/RAL

SRM/SRB interoperation

using gLite

• Pretend SRB is a

“Classic SE”• Classic SE still supported

by gLite FTS

FTS

SRBDisk storage

SRM

GridFTPGridFTP

SRM selects pool node…

Disk storage

GridFTP

Disk storage

GridFTP

LFC

Jens Jensen, STFC/RAL

Achievements - FTS monitoring

Jens Jensen, STFC/RAL

Achievements – standards

• SRM 2.2 is now an OGF standard– Collaboration between SRM developers– …and WLCG– New challenges ahead

• GLUE– Contributed to GLUE SE schema– 1.3, also some for 2.0

Jens Jensen, STFC/RAL

What Keeps the Unreasonable

(Wo)Man Awake at Night?• CUS – Campaign for

Usable Storage• Fabric• Staff...!!• Coordination

Jens Jensen, STFC/RAL

What is Usable Storage

• Users: “we want usable storage”• Deployment: “storage is usable if it’s

being used”• Not necessarily…• Identified (currently) 13 areas

– Somewhat overlapping– But that is normal

Jens Jensen, STFC/RAL

What is Usable Storage

• Robust– Doesn’t fall overMeasure uptime (for some definition of

uptime)

• Good performanceRequests per second, concurrent users

– Can be tested – DESY did this for dCacheCan be tested! (Dave Newbold for CASTOR,

ScotGrid for DPM and dCache)

– (Also tests the SRM itself)

Jens Jensen, STFC/RAL

What is Usable Storage

• Good Overall Data PerformanceTests the data movers and networks

– Experiments are good at this– Also 3rd party transfers, and to tape– Optimisations

• Ensures resource availability– Concurrent users (other experiments, same

expt)Ancient available/used metrics

– Load balancing, dynamic alloc.

Jens Jensen, STFC/RAL

What is Usable Storage

• Monitored. Accountable.– See when something goes wrongReliable accounting dataMinimise downtime

• Maintainable– Ease upgrade, installation and configurationMinimise downtime

• Tested (prior to release)

Jens Jensen, STFC/RAL

What is Usable Storage

• Standards compliant and interoperable– Provides SRM 2.2 / GLUE 1.3 / GridFTP– Extensive test suite available

• Secure– Access control, secure implementations

• Supported– Upstream: developers

• Publishing metadata in current schema• Usable by applications (interfaces)

Jens Jensen, STFC/RAL

Challenges

Services

Capabilities

Scale,Performance

Economy,Sustainability

Middleware

State of the Art

Users

Challenges

Jens Jensen, STFC/RAL

Users

Applications

Culture,History

Customermgmt

Usability

Users

Jens Jensen, STFC/RAL

Services

Trust

Availability

Accounting

Discovery

Services

Jens Jensen, STFC/RAL

State of the Art

WebServices

Virtualisation

Media

State of the Art

Jens Jensen, STFC/RAL

Middleware

Stability

Applications

MaintenanceSupport

Ease of installAnd Config

Middleware

Jens Jensen, STFC/RAL

Scale, Performance

Staging

Transfer rates Size of files

Number of files

Volume

Scale,Performance

Jens Jensen, STFC/RAL

Sustainability, Economy

Scale

Trust Dynamic

Agreement

Cost Model

Economy

Jens Jensen, STFC/RAL

Capabilities

Content

Access

Curation

SECURITY

Capabilities

Jens Jensen, STFC/RAL

Conclusion

• Lots of things achieved• Lots of stuff to do

– Somehow always harder than expected– Doesn’t asymptotically tend to zero– Plus there are regular peaks so it doesn’t even

converge

• Storage is important! should not be underestimated

• Good community to go forward into GridPP3