Post on 27-Mar-2015
transcript
page 1
Advanced Technology LaboratoriesAdvanced Technology Laboratories
Network Performance Monitoring at Small Time Scales
Dina Papagiannaki, Rene Cruz, Christophe Diot
page 2Advanced Technology LaboratoriesAdvanced Technology Laboratories
Motivation
• Network management for large-scale networks almost exclusively relies on SNMP.
• SNMP reports on aggregate link activity for the duration of the polling interval (5 mins).
• Operators provision their network around these values according to provider-specific “acceptable” utilization levels.
page 3Advanced Technology LaboratoriesAdvanced Technology Laboratories
Questions
• Can one infer delay “degradation” from SNMP link counters?
• Can one infer delay performance from output link utilization?– At what time scale should these measurements be
taken?
• How do we summarize such high-resolution measurements in a 5-minute counter?
page 4Advanced Technology LaboratoriesAdvanced Technology Laboratories
Terminology
A micro-congestion episode is a short-lived episode in the lifetime of a link when packets face
increased delays due to crosstraffic.
Metrics: Amplitude, Duration, Frequency
page 5Advanced Technology LaboratoriesAdvanced Technology Laboratories
Measurement Data: Sampling the Output Queue
• Collect packet traces from links attached to the same router (set 1: OC-3, set 2: OC-12)
• Compute single-hop delay using GPS accurate timestamps for arrival and departure
OC-3
OC-3
page 6Advanced Technology LaboratoriesAdvanced Technology Laboratories
Methodology
• Compute link throughput for non-overlapping time intervals of (1ms, 10ms, 100ms, 1s) duration
• Collect all delay samples for each interval• Associate throughput level with delay distribution
d1d2
OutputLink
page 7Advanced Technology LaboratoriesAdvanced Technology Laboratories
Delay performance (OC-3)
page 8Advanced Technology LaboratoriesAdvanced Technology Laboratories
Instantaneous link utilization and delay
• Instantaneous link utilization may be high even when packets do not experience congestion!
NO QUEUEINGDELAY
page 9Advanced Technology LaboratoriesAdvanced Technology Laboratories
5 minutes too long to capture micro-congestion
5 minutes
page 10Advanced Technology LaboratoriesAdvanced Technology Laboratories
Inference of Duration and Frequency
• If a micro-congestion episode persists in time, it should be visible across time scales
• For each time scale τ we count the number of intervals exceeding θ throughput level
• Measure fraction of overloaded intervals within each 5-minute interval
Output link
page 11Advanced Technology LaboratoriesAdvanced Technology Laboratories
Inference of Duration (cntd)
• If the fraction of “congested” intervals exceeding θ at time scale τ+1 is greater than the fraction of “congested” intervals at time scale τ, then significant fluctuations at time scale τ.
Reporting interval
page 12Advanced Technology LaboratoriesAdvanced Technology Laboratories
Duration/Frequency
page 13Advanced Technology LaboratoriesAdvanced Technology Laboratories
Summary
• 5 minute average utilization measurements can hide micro-congestion episodes.
• There is no unique time scale that captures micro-congestion.
• Impact needs to be studied at multiple time scales simultaneously.
• New metric to address network performance monitoring at small time scales.
page 14Advanced Technology LaboratoriesAdvanced Technology Laboratories
Ongoing Work
• We need to identify the impact of– Link capacity
– Traffic arrival pattern
• We have instrumented an entire router and analyze busy periods.