+ All Categories
Home > Documents > Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra...

Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra...

Date post: 04-Jan-2016
Category:
Upload: camilla-reynolds
View: 217 times
Download: 2 times
Share this document with a friend
Popular Tags:
25
Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany
Transcript
Page 1: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Troubleshooting GridFTP flows with XSP and Periscope

Dan Gunter, presenterAhmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany

Page 2: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Outline Motivation Review of perfSONAR PerfSONAR issues New components to address them

UNIS Periscope XSP NLMI

E2E example with GridFTP Visualizations from SC10 demo Questions & rotten fruit

2/1/20112 Internet2 Joint Techs 2011. Clemson, SC

Page 3: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Motivating Use-Cases Analyzing PBs of experimental data on an HPC cluster Offloading or disseminating PBs of simulation output Large data transfers

sourc

e:

htt

p:/

/xkc

d.c

om

/401/

2/1/20113 Internet2 Joint Techs 2011. Clemson, SC

Page 4: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

PerfSONAR Overview Infrastructure & software for network

performance analysis

User orApplication

Abbr Name Purpose

LS Lookup service Find sources of measurements

TopS Topology Service

Describe network topology

MP Measurement point

Retrieve/publish measurements

MA Measurement archive

Store/publish measurements

TS Transformation service

Aggregate, sample, smooth measurements

Discovery

Data

2/1/20114 Internet2 Joint Techs 2011. Clemson, SC

Page 5: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Motivating questions How can we accurately forecast application

performance? How can we detect performance anomalies in real-time? How can we troubleshoot poor application performance?

And improve it!

‘Shooting the gap between expectation and reality

2/1/20115 Internet2 Joint Techs 2011. Clemson, SC

Page 6: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

PerfSONAR issues① Data is hard to find

Cannot simply ask “which MPs have data for path”

② Slow Lookups across multiple domains Polling for data = RTT_net + Delay_DB + Delay_WS XML serialization/deserialization

③ E2E analysis is difficult No integrated host, application monitoring Analysis/visualization done client-side and not

exported

④ Measurement frequency is static Always-on and lack of aggregation encourages large

intervals2/1/20116 Internet2 Joint Techs 2011. Clemson,

SC

Page 7: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

① Data is hard to find

2/1/20117 Internet2 Joint Techs 2011. Clemson, SC

Page 8: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Unified Network Information Service (UNIS) Merges TS & LS Topology model

Tree of nodes at different layers (Network/Node/Port)

Relations between arbitrary nodes Node properties

‘GIS for networks’ Relates MPs, MAs to topology

2/1/20118 Internet2 Joint Techs 2011. Clemson, SC

Page 9: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

② Slow

2/1/20119 Internet2 Joint Techs 2011. Clemson, SC

Page 10: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Periscope: Topologically aware cache PerfSONAR requests have topological locality Pre-fetch and cache relevant perfSONAR

information New protocols to indicate interesting sub-topologies Analysis functions

domain-specific transformations, e.g. forecasting visualization (whee!)

Preserve uniform perfSONAR interface

User or ApplicationperfSONAR interface

Periscope

MP/MA

LS ...

2/1/201110 Internet2 Joint Techs 2011. Clemson, SC

Page 11: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Periscope data representation Follow PerfSONAR data model But use a simpler, more efficient format Many good options:

JSON ✔ BSON ✔ Thrift Avro Protobuf NetLogger

2/1/201111 Internet2 Joint Techs 2011. Clemson, SC

Page 12: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

③ E2E Analysis is Difficult

2/1/201112 Internet2 Joint Techs 2011. Clemson, SC

Page 13: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Missing metrics

OSI Layer perfSONAR metrics

Application X

Presentation X

Session X

Transport bandwidth, delay

Network capacity, bandwidth, delay

Data link availability, loss, errors

Physical availability, errors

E2E Component perfSONAR metrics

Disk X

Host / Cluster X

Network “yes”

Network layers End-to-end components

2/1/201113 Internet2 Joint Techs 2011. Clemson, SC

Page 14: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

NetLogger Machine Information (NLMI) Basic set of host probes, using /proc

Host interface statistics TCP settings CPU, memory Disk I/O

Export data in Periscope data model

2/1/201114 Internet2 Joint Techs 2011. Clemson, SC

Page 15: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

④ Measurement frequency is static

2/1/201115 Internet2 Joint Techs 2011. Clemson, SC

Page 16: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

eXtensible Session Protocol (XSP) Establishment, termination, and negotiation of a

session between end-user application processes Session = stateful layer over multiple other NE’s In-band or OOB signaling of control information

Other metadata can also be forwarded

A B C

TCP TCPxspd

xspd

xspd

App AppSession

NE NE NEMetadata

2/1/201116 Internet2 Joint Techs 2011. Clemson, SC

Page 17: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Monitoring GridFTP GridFTP’s XIO allows interception of I/O New XIO layer can talk to a local xspd

Signaling: open/close Performance: aggregated read/write

NetLogger’s nlcalipers library aggregates reads/writes into periodic summaries

XIO layer

GridFTP server

XIO layer

Disk and Network

op

era

tion

xspdsignalingperformance

2/1/201117 Internet2 Joint Techs 2011. Clemson, SC

XIO/XSP

Page 18: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Combining XSP, Periscope, NLMI

2/1/201118 Internet2 Joint Techs 2011. Clemson, SC

xspd

SignalingXIO performance

XIO layer

XSP layer

XIO layer

NLMI Host stats

GridFTP server

XIO layer

XSP layer

XIO layer

NLMI

GridFTP server

...

Periscope

perfSONAR services

ClientsClientsperfSONAR protocols

Page 19: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Visualization

2/1/201119 Internet2 Joint Techs 2011. Clemson, SC

Page 20: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Visualization cont.

2/1/201120 Internet2 Joint Techs 2011. Clemson, SC

Page 21: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Conclusions Periscope provides a platform for perfSONAR

analysis Caching to reduce latency, centralized correlation

Integration with XSP provides transparent monitoring and awareness of application state

Still polling perfSONAR, though – Publish/Subscribe?

D. Martin SwanyFaculty, UD

Ezra KisselGrad student, UD

Ahmed El-HassanyGrad student, UD

Guilty parties

2/1/201121 Internet2 Joint Techs 2011. Clemson, SC

Guilherme FernandesGrad student, UD

Page 22: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Questions

2/1/201122 Internet2 Joint Techs 2011. Clemson, SC

Contact: [email protected]

Page 23: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Extra slides

2/1/201123 Internet2 Joint Techs 2011. Clemson, SC

Page 24: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

UNIS exampletopology

id : esnetdomain

id : urn:ogf:network:domain=ps.es.net,node

_id : urn:ogf:network:domain=ps.es.net:node=albu-cr1name : albu-crldescription : Juniperaddress

type : hostnamevalue : albu-crl

locationlatitude: +35.08longitude : -106.64

2/1/201124 Internet2 Joint Techs 2011. Clemson, SC

Page 25: Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

UNIS Example, cont. <unis:port id="urn:ogf:network:domain=ps.es.net:node=albu-

cr1:port=134.55.40.186"> <unis:address type="ipv4">134.55.40.186</unis:address> <unis:address type="hostname">albucr1-sdn-a-

albusdn1.es.net</unis:address> <unis:relation type="over"> <unis:portIdRef>urn:ogf:network:domain=ps.es.net:node=albu-

cr1:port=ge-5/0/0</unis:portIdRef> </unis:relation> <unis:portPropertiesBag> <nmtl3:portProperties> <nmtl3:netmask>255.255.255.252</nmtl3:netmask> </nmtl3:portProperties> </unis:portPropertiesBag> </unis:port> </unis:node></unis:domain></unis:topology>

2/1/201125 Internet2 Joint Techs 2011. Clemson, SC


Recommended