www.ict.csiro.au
End-2-End QoS Internet
Presented by: Zvi Rosberg3 Dec, 2007
Caltech Seminar
www.ict.csiro.auWhat is this talk about
The shortcoming of QoS support in current Internet
A novel holistic Rate Management Protocol A new scalable QoS guarantee architecture The theoretical foundation of our architecture How TCP window flow control may adapt in
the presence of our network layer RMP Another E-2-E prioritized Delay/Loss RMP
www.ict.csiro.auMotivation
Shortcoming of current QoS architecture Beside being immature and requiring
horrendous configuration, current QoS also has…
Fundamental inhibitors:
1. Scalability for real QoS guarantee (IntServ and Cisco’s “IntServ over DiffServ”)
2. No bandwidth nor E2E delay guarantee when using a scalable configuration of DiffServ
www.ict.csiro.auSo what are we doing about it ?
We are implementing a prototype on Network Processors (NPU) addressing the current QoS issues - The architecture is
1. Scalable and has bandwidth, loss and E2E delay guarantee
2. Adaptive - so configuration is minimized
3. Allocates the residual bandwidth fairly
The NPUs execute a new IP layer protocol that router’s should run in the future
www.ict.csiro.au
The Architecture
www.ict.csiro.auThe Key Elements of our solution
Runs in Edge & Core Routers at IP
layer
EdgeRouter
10Eth
erne
t
CoreRouter
11
CoreRouter
12
EdgeRouter
20
CoreRouter
13
CoreRouter
14
CoreRouter
15
EdgeRouter
30
Ethernet
Ethernet
UserDevices
UserDevices
UserDevices
1
2
3
51
52
53
3
4
RMP
RMP
RMP
Novel Rate Management
Protocol (RMP) for Multi-Service Flows
RMP
RMP
RMP
RMP
RMP
Provides Services to Management
functions in the Edge Routers
Services
Services
Services
www.ict.csiro.auArchitectural Components
QoS Fair Rate
Calculation
RMP
Link Penalties Gathering
Performance Probing
Admission Control
Scalable Bandwidth Reservation Protocol
Classification/Marking at Edge Routers
Rate Policing
in the Edge
Priority Packet
Scheduling in Routers
Control PlaneData Plane
www.ict.csiro.au
Theoretical Foundation
www.ict.csiro.auOur Theoretical Contribution
Extending Fairness beyond “best-effort” service Extending the primal-dual iterative distributed
algorithm (used by Kelly) for rate allocation with
1. Rate and delay constraints
2. Priority packet scheduling Revisit TCP flow control when rate is controlled
by the network layer An aside question is: Why priority scheduling?
It improves link utilization – delay-sensitive packets will not have to wait for delay-insensitive packets, so we can have more from the delay-insensitive packets
www.ict.csiro.auFairness with Best-effort
- proportional fairness is equivalent to the solution of:
as long as X is convex
www.ict.csiro.auFairness with QoS
A natural way to extend the best effort fairness is to add the QoS requirements to the constraints and …
… optimize on the residual link capacities
www.ict.csiro.au
Since X is convex – proportional fairness follows
Flow rates of prio 1,2…,m traversing
each link
maximum loss and delay constraints
minimum bandwidth constraints
Fairness with QoS (Cont.)
www.ict.csiro.au
The delay/loss constraints are NOT EXPLICIT – they are attained by an outer-loop control of
Fairness with QoS (Cont.)
www.ict.csiro.au
Primal-dual iterative distributed algorithm extension
The fair residual rates, , are computed iteratively after a reduction to residual link capacities, , given by
… which is made possible by our scalable reservation protocol
The policed rate of flow is then
www.ict.csiro.auThe Rate Management Protocol (RMP)
• In each router output link n and priority m :
• Total rate of flows from priorities 1,..,m on link n on unreserved link capacity
• Link capacity reduced by utilization upper bound per priority class m
• Adaptively set from sources based on RTT and Loss probing
• Route penalty of flow i
www.ict.csiro.auStability Proof
To prove stability with fixed We redefine the routing matrix, , to include one virtual link for each priority
classFlows with priority m use all virtual links having priorities m along their
original path
The redefined problem is a single class problem equivalent to the priority problem
After this reduction, stability follows by Kelly’s results
www.ict.csiro.auStability Proof (cont.)
To prove stability with adaptive “Unhappy” flow sources (having excessive delay/losses) signal it in their
RMP packets Congested links decrease the respective
To prove convergence, we allow only to decrease In practice, convergence is observed also when are also
increased when flow sources are “too-happy”
www.ict.csiro.au
TCP Flow Control - Revisited
www.ict.csiro.auTCP Flow Control Revaluation
Once RMP is in place, TCP flow control needs a revaluation The RMP of the core network will take care of fair rate calculation
and congestion avoidance RMP will also signal end applications about their current target
rates, and then…TCP could be extended beyond “best-effort”
Given rate, , TCP can achieve it with a window update of the form:
www.ict.csiro.auPerformance Evaluation
We showed that assuming linear scalability, the window flow control converges to a unique stable state under totally asynchronous updates
linear scalability: Total number of bytes queued in each link scales up linearly with the window size
It is an average flow property of the flows crossing a given link, rather a per-flow property
Plausible for large networks
Stability was also verified by simulation In the fluid model of [Mo & Walrand] used to relate
rate and windows, linear scalability is implied
www.ict.csiro.auTCP Flow Control Comparison
Epoch ISP Network, USA# core links: 74 (37 full-duplex)# flows: 512# access links: 512core link capa: 1 Gb/saccess link capa: 0.1 Gb/s
www.ict.csiro.auSimulation Method
2-way TCP flows using fixed shortest paths ACKs are either piggybacked or pure (statistically) RTO is estimated according to RFC 2988 (Jacobson Alg) Duplicate ACKs are triggered if All TCP flow controls half their window size upon 3-
duplicate ACKs and reduce it to 2 MSS upon RTO Otherwise - Fast TCP adapts its window sizes according
www.ict.csiro.auSimulation Method (cont.)
Simulation time is about 3.5 real operational minutes In every step - window packets are processed in one batch First, they are arbitrarily distributed between forward and
backward paths Then, the packets that can “fill” the links are in transit The rest, are distributed between the bottleneck links in
proportion to the bottleneck queueing time Async operation is modelled by i.i.d Bernoulli r.v's
determining which of the flows receive an ACK
www.ict.csiro.auTCP Flow Control Comparison
Our TCP Flow Control (9 typical flows windows)
www.ict.csiro.auTCP Flow Control Comparison
Fast TCP Flow Control
www.ict.csiro.auTCP Flow Control Comparison
TCP Vegas Flow Control
www.ict.csiro.auTCP Flow Control Comparison
TCP Reno Flow Control (“Sawtooth”)
www.ict.csiro.auComparison Summary
Avg
Rate
Avg RTT Avg Win Fairness Dev
Max Fair Dev
Ours 492 P 191 ms 28 P 3% 20%
Fast 479 P 231 ms 28 P 5% 25%
Vegas 449 P 248 ms 29 P 4% 44%
Reno 451 P 548 ms 59 P 12% 91%
www.ict.csiro.auFlow Control with QoS Support
Avg Rate Avg RTT Avg Win
Priority 1 43.8 P 50 ms 1 P
Priority 2 224 P 56 ms 5.12 P
Priority 3 225 P 81 ms 7 P
3 x 256 2-way TCP connections with 3 priorities
Utilization upper bounds: (0.1, 0.75, 1.0)
Avg total fair rate: 164.30 packets (compared with 492)
Avg Fairness deviation: 5.5%
www.ict.csiro.auSimulation with Link Utilization Adaptation
When are adapted based on flow source experienced RTT and Losses
(i.e., RTT > RTO), then all QoS requirements are met
www.ict.csiro.au
Another E2E Delay-Loss Control
www.ict.csiro.auRate Time Derivative in the Fluid Model
clearance time of bits from flows
with prio higher/equal p in link l at time t delay prices for flow i at time t
We study the following prioritized combined Rate-Delay control problem
www.ict.csiro.auDelay Time Derivative in the Fluid Model
total rate of flows with priorities
less/equal p in link l at time t
The rate control is the gradient search of
www.ict.csiro.auDelay Prices Adapting
is learned by the flow source from the RMP packets
… and is adapted if Adaptation signals must also be
disseminated to other relevant sources …. which is done again with RMP signalling
packets
www.ict.csiro.auResult Summary
If the routing matrix is full-rank, then
For any e2e delay requirement, there is a unique equilibrium point
The adaptive rate control converges to the stable point from any initial condition
Synchronous Fluid Model
Time Lag Fluid Model (Rate and Delay effects)
For a single bottleneck case – global stability holds true only if time lag is limited (e.g., ~650 ms)
Emulation – holds true for multiple bottlenecks
www.ict.csiro.au
Thank You