Data Transfer Efficiency - leave no byte unchurned Jens Jensen Rutherford Appleton Laboratory...

Data Transfer Efficiency- leave no byte unchurned

Jens JensenRutherford Appleton

LaboratoryGridPP26, U Sussex, March

2011

Background

• GridPP’s data grid– Distributed Storage Elements– Data movers (FTS, PhEDEx et al)– Catalogues (usu. replica)

• e-Infrastructure (aka cyberinfrastructure)

• (Presentation at ISGC)

The Data Grid

• WLCG is primarily a data grid– Computation can (in principle) be

redone• Jobs go to where data is

– Moving a job is quicker than moving data

Premature Optimisation is the Root of All Evil

Postmature non-optimisation is the root of

some evil• The role of infrastructure code

– Scientist as a programmer– “Bad” code moves up the stack?– “Bad” code improves over time?

• Doofers stay in prod’n

Efficiencaciousness Goals

Service• Availability• Performance• Grows as needed• Robust (no SPoF?)

People• (Effective)

support• Training• Expertise• Availability of…

Approaches• Philosophy

– Get it done – WLCG– Get it done right – EGI?– Do It Perfectly The First Time…

• Evolutionary (control system) vs revolutionary– Proactive vs reactive

Efficiencaciousness Issues

• Failures– Sites – BDII, network– Elements – storage– Components – disk servers

• Timeouts• DDoS


• Overall effort– Funded, contributed, external

• Availability of expertise– Single Point of Knowledge

• Decoherence• 2nd Law of Thermodynamics• Learning from incidents


• Primary communication– Sites– Users: large VOs, small VOs, single

users– PMB

• Secondary– WLCG– NGS


• Sites– There Is Always A Bottleneck

Somewhere– Site dependent– Usage dependent

• Information– Freshness– Accuracy (“spped is substute fo

accurcy”)


• Usage patterns– C.f. Wahid’s talk yesterday– WAN vs LAN (WN) traffic

• Technology– In the narrow sense (drives, controllers)– And the wider sense: dist’d filesystems

• Support: Upstream (EGI), Fabric


• Overheads– Complexity of use of stack (see next)– Infrastructure is complex– But Complexity Has To Go Somewhere

• Time-to-production– Testing, troubleshooting, monitoring,

tweaking, tuning

•DDM et al

Expt

•FTS

•Catalogues

Data movers

•SRM

•SE GRIS

Data control

•WAN: GridFTP

•LAN: RFIO, DCAP, …

Transport

•Routers, switches, firewalls, OPN

Network

•HDD, SSD, tapes

•Network cards, disk/RAID controllers

Fabric

With apologies to the OSI stack

PROGRESSParticular Pain Point Principle

Progressing Forward

• What is progress• How to measure progress

The Good News

• We’ve come a long way• Don’t think there is a skills gap

– But some SPoKs

Graeme’s talk

• “Get the best out of what we can afford to buy”

• Proactive sites better• Standards are good

E[GM]I involvement

• EMI data roadmap– Support for dCache, DPM, StoRM– Support for standards (NFS4, CDMI)

• But then– StoRM=INFN, dCache=DESY,

DPM=CERN

The Cloud View

• Supplement resources with on-demand

• Agile• CDMI is superset of SRM

– But using ReST+JSON, not SOAP

(Open) Standards

• Standards promote interoperation and stability

• Interoperation • Multiple (independent)

implementations– Both Java and (C or C++)

The Case for Non-HEP Data

• Benefit from non-HEP data– Outreachy stuff– Benefit to society (eg saving lives)

• NGI interop (at compute)• Others…

SUMMARY

Efficiencaciousness Goals

Service• Availability• Performance• Grows as needed• Robust (no SPoF?)

People• (Effective)

support• Training• Expertise• Availability of…

Date post:	18-Dec-2015
Category:	Documents
Upload:	audra-hardy
View:	219 times
Download:	1 times

Data Transfer Efficiency - leave no byte unchurned Jens Jensen Rutherford Appleton Laboratory...

Documents