1 Routing Dynamics in Simultaneous Overlay Networks Mukund Seshadri Randy Katz...

transcript

Routing Dynamics in Simultaneous Overlay

Networks

Mukund Seshadri Randy Katz(mukunds@cs.berkeley.edu

randy@cs.berkeley.edu)

Berkeley-Helsinki Short Course Aug. 2003

Problem Consider overlay routing when multiple

independent overlay networks/flows interact: Can this be unstable/inefficient?

Identify such scenarios. Suggest improvements.

Identify scope for reduction of measurement overhead.

General Motivation End-host controlled routing can become significant

Pure Overlay Network protocols (RON[3], Detour[4], ESM[5]) Overlay primitives (“Path reflection”[1], i3-based [2]) Better routing than Internet/BGP (resilience/performance/multicast/etc.)

What if several entities set up their own overlays? Companies setting up distribution overlay networks… Or, more ad-hoc users setting up overlay networks… Flows within a single overlay…

Consider overlay networks/flows which have some physical links in common, but don’t explicitly coordinate with each other.

Unstable Routing Example

L1 failure can cause synchronized oscillation of both flows between the two alternate paths

Primary Paths

Alternate Paths

BottleneckPhy. Link

1+ Mbps(L2)

2 MbpsL1

1 Mbps(L3) Ov.Nw. Nodes

(2 Ovns)

Sources

Destinations

Focus Main application – multimedia streams

Long-lived (medium) flows : ~ 1hr (5min) . Flows require specified bandwidth levels Flows require route stability (Packet-reordering, jitter undesirable) Secondary app – long high volume transfers/sessions Problem considered: selection of best routes (not location/DHTs)

Size: 50-500 overlay flows; 10-50 nodes each.

Independent decision makers - no explicit info. sharing

Unlike PlanetLab[6], underlay[7] model, i3-based soln.[2] Independent administration might be desirable. Don’t have to wait for infrastructure nodes to come up. Most protocols like ESM can’t scale to thousands of nodes.

Overlay Network Model Given M overlay networks/flows with N nodes each

Probing of all potential paths is done (O(N) cost). Path characteristics are inferred from probes in some time

window With some error factor We consider only bandwidth

Best path is selected to send traffic on (GREEDY) Route change based on bandwidth improvement threshold (H)

Path-level simulator Characterizes shared bottleneck links. The level of sharing is characterized by “path density” Unicast CBR flows with bandwidth requirement.

Metrics of interest Loss Rate (related to bandwidth) Stabilization time

Contribution Study the need for “restraint” in

route selection Randomness in selection selection Hysteresis Time between re-route decisions

Hysteresis Required No hysteresis threshold (H) for route change =>

unstable. We will use 99% stabilization time.

H affects loss rate…

Will explore more later in the talk…

When does Greedy “fail”?

Large flows => more effect when re-routed => lower stability

Defaults: 500 overlay flows, 50 bottleneck links link capacities ~ flow

requirements ~50% cross-traffic 10% measurement error. 4x variation in link b/w. ~25 links/flow (density) Optimal Threshold

Assumed

When does Greedy “fail”?

High sharing=>many route-changes Flows within a single overlay. when overlay nodes are skewed towards certain ASes, like univ.s. if several overlay flows independently use a medium size shared

infrastructure.

Cross-Traffic

High Cross-Traffic causes the effect of overlay flows on available bandwidths to be lower, so greedy is more stable.

Other factors investigated: routing window variation, measurement error, excess capacity, bandwidth distribution.

Summary of “Greedy” The following factors contribute to poor

stability and performance of “Greedy” overlay path selection

Several flows’ paths share a large number of bottleneck links.

There is not much spare capacity in paths used. There is a large variation in link and flow bandwidths. The overlay traffic is a high fraction of traffic on the

bottleneck links Each flow’s bandwidth is significant compared to

bottleneck link bandwidth.

Improvements to Greedy Randomly select path to be chosen

ARAND: In proportion to available bandwidths SRAND: Best of randomly selected subset of

size S …in proportion to capacity Reduces measurement overhead Works well for server load balancing [8]

(but different work model: jobs arrive and leave, and are assigned to only one server for their lifetime)

GRAND: Randomly select from the best S paths

Does Randomizing Help?

Randomization more useful at high densities. More stable, lower loss, less sensitive to threshold

setting.

Hysteresis Threshold Optimal value of H very sensitive to

parameters. Flows can automatically discover the values of

H. Flows can independently “probe” values of H

No route change => decrease H Route change => increase H

Try AIAD, MIMD, etc. Can perform even better than with fixed H…

Exploring “H”

Very similar, MIMD stabilizes slightly quicker… I/D pmtrs. not as sensitive to simulated network pmtrs.

Exploring “H” (Contd.)

Performs much better than with fixed threshold, loss rates close to 0

Stabilization times similar to fixed case.

Summary SRAND is as good as or better than

GREEDY in most cases Measurement costs lowered, with performance

similar to the proportional randomization method. Automatic discovery of H works better than

fixed H (and is more feasible). Increasing time windows can help, particularly

when flows arrive/depart.

Future Work Define a general method that combines

randomization, hysteresis estimation, and time variation (like simulated annealing)

Explore dynamic scenarios (flows arrive/depart). Explore 2nd level control loop for MIMD pmtrs. Implement/simulate using real topologies.

Can we define a general notion of “friendliness” pertaining to both route selection and traffic distribution over different routes?

References1. Network layer Support for Overlay Networks – John Jannotti –

OpenArch 2002.2. Infrastructure Primitives for Overlay Networks – Karthik

Lakshminarayanan et al. – under submission.3. Resilient Overlay Networks – Andersen et al – SOSP 20014. Detour: a Case for Informed Routing and Transport – Savage

et al. – IEEE Micro Jan 1999.5. A Case for End System Multicast – Yang-hua Chu et al. – JSAC

2002.6. PlanetLab – http://www.planet-lab.org7. A Routing Underlay for Overlay Networks – Nakao et al. –

Sigcomm 2003.8. How Useful is Old Information – M.Mitzenmacher – PODC 19979. An Analysis of Internet Content Delivery Systems – Saroiu et

al. – OSDI 2002.

…Backup Slides…

Stabilization Times of the *RANDs

Generally SRAND and ARAND stabilize quickly and have a very low loss rate.

Also investigated the effect of subset size on SRAND

Other Factors Small amount of cheating doesn’t hurt the

good flows, large amount does.

If link bandwidths are much higher than flow bandwidths, Greedy is more stable and performs better.

If link and flow BW are similar, then a high variation in the same causes Greedy to be fairly unstable.

Extra Slide

2-Flow Illustration

We can randomize Route selection

Proportional to Available BW

Time intervals Of assessment and

rerouting.

2-Flow StrawMan

Synced Oscillations Desynced Rerouting

ScenarioSt

abiliz

Greedy

Proportional

1 Routing Dynamics in Simultaneous Overlay Networks Mukund Seshadri Randy Katz...

Documents