+ All Categories
Transcript

The Data Logistics Toolkit

Martin Swany

Professor, School of Informatics and ComputingExecutive Associate Director, Center for Research in Extreme

Scale Computing (CREST)Indiana University

The Data Logistics Toolkit

• Logistics - the management of the flow of resources from the point of origin to the point of consumption

• The DLT integrates local and distributed storage infrastructure, file transfer software, performance monitoring and tuning

• The DLT software distribution supports the creation of network- optimized data nodes

DLT Overview

• Set of packages with configuration scripts, etc.

• Allows the configuration of – DTN with GridFTP– IBP storage depot for content distribution– Phoebus WAN accelerator– On-ramp for Internet2 AL2S using XSP

• Includes Periscope/perfSONAR monitoring• Automatic network tuning

DTN with AL2S On-Ramp

• Working with the Globus team at U. Chicago and Argonne

• Leveraging our eXtensible Session Protocol (XSP) to create end-to-end, “sessions”– user-network interface (UNI)

• XSP daemon acts as network controller– signals AL2S/OESS, OSCARS, OpenFlow

• GridFTP XIO driver, updating to use the Globus Transfer Network Controller API

• Generic, transparent on-ramp to circuit networks like AL2S

WAN Acceleration

• A key reason the Science DMZ model “works” is the separation of lossy access networks from high-bandwidth, long-latency links

• Termination of TCP connections in “middleboxes” can increase throughput by reducing the RTT

• Protocol translation

• Storage in the network to buffer and burst

Distributed Storage for Content Distribution

• IBP provides a primitive, scalable, in-network storage service

• File-like abstractions can be built on top of this• Uses a data structure known as an exNode (like

a Unix inode) to track allocations• These basic building blocks can be used to build

various instances– Parallel filesystem– Distributed RAID-like storage– Content distribution network– Bittorrent-like peer to peer transfers

Architecture• Unified Network Information Service (UNIS)

– Descendant of perfSONAR Lookup and Topology Services– Network and service “graph”

• Intelligent Data Movement Service (IDMS)– Data dispatcher– Operates on UNIS data– Spawn storage services dynamically in GENI

• Periscope/perfSONAR– Monitoring for operational integrity and optimization, BLiPP

• Storage Services– IBP, prototype based on Ceph

• Other services– Data transfer (GridFTP), WAN acceleration

Earth Observation Depot Network (EODN) –An open, community specific content distribution

network for remote sensing data

Landsat data• Landsat 8 launched February 13th, 2013

• Covers the entire land surface of the Earth every 16 days – 8 day offset from Landsat 7

– ~700 scenes each day

• Each scene contains a GeoTIFF product: high-resolution sensor images

– ~1GB compressed, 2GB uncompressed

• Traditionally used for environmental monitoring and land use and land cover change studies

EODN

Client

EODN (DLT) WISC

IUNYSER

MIZZ

RealEarthUW-Madison

UNISDMS

discover / measure

(3) stage sensing data

(2) harvest

(6) Processing…

(7) WMS upload

(5) fast download

EODNHarvester

(1) subscribe

(4) publish

webGUI

Landsat Ground Network

Cisco Appliance Platform

• In collaboration with Internet2, Cisco and Fusion-io

• Cisco C220 server– 2x Intel® Xeon® E5-2680, 16 cores@4GHz, 64GB DDR3 RAM– Fusion-io ioDrive2 1.2 TB

• CentOS 6.4 Linux with DLT RPMs and tuning for data transfer throughput

12

Acknowledgements

• Staff Scientist Dr. Ezra Kissel leads the DLT development efforts, PI of the GENI IDMS effort

• CC-NIE integration project with U. Tennessee and Vanderbilt U.

• CC-NIE integration project with the Globus team at U. Chicago and Argonne Nat’l Lab

• EODN development with AmericaView, U. Wisconsin

Phoebus-SLaBS performanceGridFTP transfers over dedicated 10G path, increasing WAN latency, 4ms LAN RTT and .001%

edge loss


Top Related