+ All Categories
Home > Documents > Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research...

Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research...

Date post: 12-Jan-2016
Category:
Upload: noel-robbins
View: 222 times
Download: 0 times
Share this document with a friend
40
Intel Intel Research Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Lukas Kencl Intel Research Cambridge Intel Research Cambridge UCL, November 12, 2003 UCL, November 12, 2003
Transcript
Page 1: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

IntelIntel Research Research

Adaptive Load Sharingfor MultiprocessorNetwork Nodes

Adaptive Load Sharingfor MultiprocessorNetwork Nodes

Lukas KenclLukas Kencl

Intel Research CambridgeIntel Research Cambridge

UCL, November 12, 2003UCL, November 12, 2003

Page 2: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

2IntelIntel Research Research

OutlineOutline Adaptive load sharing methodAdaptive load sharing method

FlowFlow-to-processor mapping -to-processor mapping Adaptation with Adaptation with minimal disruptionminimal disruption Method validation and applicationMethod validation and application

Research: Further methodsResearch: Further methods Adaptive data structuresAdaptive data structures Dynamic code reconfigurationDynamic code reconfiguration

Outlook: Adaptive methods in networkingOutlook: Adaptive methods in networking

Page 3: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

3IntelIntel Research Research

Adaptive Load Sharing for Adaptive Load Sharing for Multiprocessor Network NodesMultiprocessor Network NodesPh.D. Work: IBM Zurich Research & EPFL, LausannePh.D. Work: IBM Zurich Research & EPFL, Lausanne

2 parts:2 parts: Flow-to-processor mappingFlow-to-processor mapping Adaptation with minimal disruptionAdaptation with minimal disruption

Page 4: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

4IntelIntel Research Research

Multiprocessor Network NodeMultiprocessor Network NodeAs a Load Sharing SystemAs a Load Sharing System

Assumptions:Assumptions:• Data arrives in Data arrives in packetized flows.packetized flows.

• Any processor can Any processor can process any packet.process any packet.

Heterogenous Heterogenous processor capacityprocessor capacity jj..

Incoming PacketsIncoming PacketsIncoming PacketsIncoming Packets

Multiple (M)Multiple (M)

ProcessorsProcessors

11

22

33

MM

Multiple (N)Multiple (N)

InputsInputs

11

22

33

44

NN

Task:Task:• Load on processors within some measure of balance.Load on processors within some measure of balance.• Same flow to same processor (reordering, context).Same flow to same processor (reordering, context).

Advantage:Advantage: system optimization. system optimization. Drawback:Drawback: complexity, overhead. complexity, overhead.

Page 5: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

5IntelIntel Research Research

Acceptable Load Sharing Acceptable Load Sharing as the Measure of Balanceas the Measure of Balance

Processing load on processor Processing load on processor jj j j (t)(t)

Capacity on processor Capacity on processor jj jj

Workload intensity on processor Workload intensity on processor jj jj(t)(t) = = jj(t)(t) / / jj

Total system workload intensityTotal system workload intensity (t)(t) = = jj(t)(t) / / jj

““No single processorNo single processor is overutilized if the is overutilized if the system in totalsystem in total is not overutilized, is not overutilized,

and vice versa.”and vice versa.”

Acceptable load sharing:Acceptable load sharing:

ifif (t)(t) 1 1 thenthen jj, , jj(t)(t) 1,1,

if if (t)(t) > >11 then then jj, , jj(t)(t) >>1.1.

Acceptable load sharing minimizes packet loss probability!Acceptable load sharing minimizes packet loss probability!

Page 6: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

6IntelIntel Research Research

Minimizing Disruption Minimizing Disruption Goal:Goal: Acceptable load sharing Acceptable load sharing

withoutwithout maintaining maintaining flow stateflow state information information

and yet and yet minimizingminimizing the probability of the probability of mapping disruptionmapping disruption (flow remapping). (flow remapping).

NPNP-complete problem-complete problem (Integer Linear Programming): (Integer Linear Programming):

maxmax vv vv(t) . (t) . jj (1{ f(t- (1{ f(t-t)(v) = j} .t)(v) = j} .1{ f(t)(v)= j1{ f(t)(v)= j}),}),

whilewhile vv a avv(t) l(v) .(t) l(v) . 1{ f(t)(v)= j} 1{ f(t)(v)= j} = = jj (t) (t) jj , , j.j.

v v - - flow identifier vector in the packet header,flow identifier vector in the packet header,

ff(t)(t)(v)(v) - - function mapping flows to processors, changing over time,function mapping flows to processors, changing over time,

vv(t)(t){0,1}{0,1} - - indicator ifindicator if vv has appeared in the intervalshas appeared in the intervals (t-2(t-2 t, t- t, t- t) t) and and (t-(t- t, t) t, t),,

aavv(t)(t) - - how many times hashow many times has vv appeared in the intervalappeared in the interval (t-(t- t, t), t, t),

l(v)l(v) - - load per packet carryingload per packet carrying v,v,

tt - - iteration intervaliteration interval. .

Even if we knew all the flow state information, an Even if we knew all the flow state information, an NPNP-complete problem – -complete problem – heuristicsheuristics..

Page 7: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

7IntelIntel Research Research

Flow-to-Processor Mapping Flow-to-Processor Mapping Multiple (N)Multiple (N)

InputsInputs

11

22

33

NN

Multiple (M) Multiple (M)

ProcessorsProcessors

11

22

33

44

MM

Incoming PacketIncoming Packet

flow flow identifier identifier vectorvector vv

v v

Weights vector Weights vector x x = (x= (x11 , ..., x, ..., xM M ).).

Upon packet arrival, a decision is made where to process the packet, based Upon packet arrival, a decision is made where to process the packet, based on the flow identifieron the flow identifier and a set of weights and a set of weights. A . A flow-to-processor mapping flow-to-processor mapping ff is thus established.is thus established.

f ( f ( vv ) = ) = 33

Page 8: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

8IntelIntel Research Research

Flow-to-Processor MappingFlow-to-Processor Mapping

Def.: Flow-to-Processor mapping function f, f (v): V M :

f (v) = j xj . g ( v, j ) = maxk xk . g (v, k),

where v is the flow identifier vector, x = (x1, ... , xm) is a weights' vector

and g (v, j)(0,1) is a pseudorandom function of uniform distribution.

Highest Random Weight (HRW) Mapping, Thaler, Ravishankar, 1997, Ross, 1998, CARP Protocol, Windows NT LB

0 1 2 3

g (v, 3)

g (v, 1)

Map to max of:

g (v, 2)

Example: 3 processors with homogenous processing capacity (weight xi = 1, i).1

Map to proc. 2.

Page 9: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

9IntelIntel Research Research

HRW Mapping Favourable Properties HRW Mapping Favourable Properties Minimal disruption of mapping in case of processor addition. Example: Add processor No. 4, vectors mapped either (i) as before addition or (ii) to the newly added processor – minimal number of vectors change mapping.

0 1 2 3

g (v, 3)

g (v, 1)

Max of:g (v, 2)

1

4

g (v, 4)

0 1 2 3

g (v, 3)

g (v, 1)

g (v, 4)

4

g (v, 2)

1 Max of:

Load balancing over heterogenous processors: weights’ vector x is in a 1-to-1 correspondence to p = (p1, ... , pm), the vector of traffic fractions received at each processor. Pseudorandom function g (v, j)(0,1) can be implemented as a fast-computable hash function.

Page 10: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

10IntelIntel Research Research

4. Download new4. Download new

x := xx := x(t).(t).

2. Evaluate 2. Evaluate

(t)(t) = ( = (1 1 (t)(t), , 2 2 (t)(t), ... , , ... , m m (t)(t)))

(compare threshold).(compare threshold).

3. Compute new3. Compute new

x x (t)(t) = (x = (x1 1 (t)(t), ..., x, ..., xm m (t)(t)).).

Adaptation through FeedbackAdaptation through Feedback

Trigger definition targets Trigger definition targets preventing overload, preventing overload, if system in if system in total not overloaded, and vice total not overloaded, and vice versa.versa.

A A threshold threshold triggers adaptation triggers adaptation when close to load sharing when close to load sharing bounds. bounds.

Flow-to-processor mapping Flow-to-processor mapping ff becomes a becomes a function of timefunction of time f(t)f(t)((vv))

Adaptation may cause Adaptation may cause flowflow remapping! remapping! How to minimize the How to minimize the amount remapped?amount remapped?

Multiple (N)Multiple (N)

Input CardsInput Cards

11

22

33

NN

Multiple (M)Multiple (M)

processorsprocessors

11

22

33

44

MM

CPCP

Control PointControl Point

1. Filtered workload 1. Filtered workload

intensity ( j) =intensity ( j) = j j(t).(t).

Problem:Problem: incoming requests incoming requests are packets, not flows! Packets not evenly distributed over flows -are packets, not flows! Packets not evenly distributed over flows ->> not evenly distributed over the request object spacenot evenly distributed over the request object space -> HRW mapping not sufficient for -> HRW mapping not sufficient for acceptable load sharing boundsacceptable load sharing bounds –> need to–> need to adapt!adapt!

Page 11: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

11IntelIntel Research Research

Adaptation AlgorithmAdaptation Algorithm

StartStart

TriggerTrigger

AdaptationAdaptation??

Wait time Wait time tt

Adapt Adapt

weights' vector weights' vector xx and uploadand upload

Triggering PolicyTriggering Policy Adaptation PolicyAdaptation Policy

YesYesNoNo

Compute filtered Compute filtered processor workload processor workload

intensity intensity (t)(t)

Page 12: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

12IntelIntel Research Research

TriggeringTriggering PolicyPolicy

Dynamic workload intensityDynamic workload intensity

thresholdthreshold

'' (t) = 1/2 (1+(t) = 1/2 (1+(t))(t))

Triggering policyTriggering policy

(i)(i) if if (t) (t) 1 1 andand maxmax jj (t) > (t) > (t) (t) then then adaptadapt;;

(ii) if(ii) if (t) > 1(t) > 1 andand minmin jj (t)(t) < < (t) (t) then then adapt.adapt.

ExampleExample::

(t) = (0.8, 0.2, 0.2), (t) = (0.8, 0.2, 0.2), (t) = 0.4(t) = 0.4

(t) = 0.7,(t) = 0.7,

11 (t)(t) >> (t)(t) adapt.adapt.

Triggering thresholdTriggering threshold

(t)(t) = = maxmax(('' (t), upper)(t), upper)

or vice versaor vice versa

Hysteresis boundHysteresis bound

upper: (1+upper: (1+ HH(t)) .(t)) . (t)(t)

lower: (1 -lower: (1 - HH(t)) .(t)) . (t)(t)

Page 13: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

13IntelIntel Research Research

Adaptation Policy: MinimalAdaptation Policy: Minimal Disruption Example (3 Proc.)Disruption Example (3 Proc.)

• reduced receives less, unaltered receives more, if reduction by a single, invariable multiplier.• minimal disruption of the mapping.

0

x1

x2

1 2 3

x2 . g (v, 2)

x1 . g (v, 1)

Max of:

1 2 3

x3

0

2/3.x1

2/3.x2

x3 . g (v, 3)

2/3 . x2 . g (v, 2)

2/3 . x1 . g (v, 1)

Max of:

x3 x3 . g (v, 3)

Page 14: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

14IntelIntel Research Research

Adaptation Policy:Adaptation Policy:Minimal DisruptionMinimal Disruption

• A, B A, B - - mutually exclusive subsets ofmutually exclusive subsets of M={1,...,m}M={1,...,m}, , M=AM=A B.B.• (0, 1).(0, 1).• f, f'f, f' – two HRW mappings with the weights' vectors– two HRW mappings with the weights' vectors xx, , x'x'::

x'x'jj = = . x . xjj,, jj A,A,

x'x'jj = x = xjj,, jj BB..

• ppjj , , p'p'jj - fraction of objects mapped to node- fraction of objects mapped to node j j usingusing ff, , f'f'..

1) 1) p'p'jj ppjj , , jj AA,,

p'p'jj ppjj , j , j BB..

2) Fraction of objects mapped to a different node by each mapping2) Fraction of objects mapped to a different node by each mapping

isis MINIMAL, MINIMAL, that is, equal tothat is, equal to | | p'p'jj - p- pj j | . |V | | . |V | at every nodeat every node j. j.

Page 15: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

15IntelIntel Research Research

Adaptation PolicyAdaptation Policy

Let Let (t)(t) 11.. Then:Then:

xxj j (t) : = (t) : = c(t)c(t) . x . xj j (t-(t-t) ,t) , ifif jj (t)(t) >> (t) (t) (( j j exceeds threshold exceeds threshold (t)(t))),,

xxj j (t) : = x(t) : = xj j (t-(t-t),t), ifif jj (t)(t) (t) (t) (( j j does not exceed threshold does not exceed threshold (t)(t)))..

IfIf (t) > 1(t) > 1,, the adaptation is carried out in a symmetrical manner.the adaptation is carried out in a symmetrical manner.

The weights' The weights' multiplier coefficient multiplier coefficient c(t)c(t) : :

( )( ) (t)(t)

minmin {{ jj(t)(t) | | jj(t)(t) >> (t)(t)}}c(t) =c(t) =

1/m1/m

FactorFactor c(t)c(t) is proportional to the minimal error and to the number of nodes. is proportional to the minimal error and to the number of nodes.

Page 16: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

16IntelIntel Research Research

ValidationValidation

Page 17: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

17IntelIntel Research Research

ExpectationsExpectations

• Workload intensity on individual processors close Workload intensity on individual processors close to that of the system in total to that of the system in total (acceptable load (acceptable load sharing);sharing);• Packet loss probability lowered Packet loss probability lowered (acceptable load (acceptable load sharing);sharing);• Persistent flows Persistent flows (appearing in two consecutive iterations)(appearing in two consecutive iterations) seldom remapped seldom remapped (minimize disruption).(minimize disruption).

Page 18: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

18IntelIntel Research Research

“Realistic” Generated Traffic and Router System Model“Realistic” Generated Traffic and Router System Model

Measured flow length cumulative distribution

Traffic characterization (approximated from various published OC-3 statistics):

• Number of packets per time interval;

• Number of flows per time interval;

• Measured flow length distribution, complemented by Pareto to generate the heavy tail;

• Identifier vector distribution;

• Per-packet processing load distribution;

• Maximal per-flow fraction of interface rate f .Router system model:• 8 processors, 13 interfaces;

• System workload intensity close to 1;

• 3 alternatives to load sharing (LS):

• Naive (no LS)

• Static (LS with static weights)

• Adaptive LS.

Page 19: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

19IntelIntel Research Research

Adaptation Keeps Per-processor Workload Intensity Close to IdealAdaptation Keeps Per-processor Workload Intensity Close to Ideal

Naive, no LS.Naive, no LS.

Static LS.Static LS.

Adaptive LS.Adaptive LS.

Max and min of all.Max and min of all.

Page 20: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

20IntelIntel Research Research

Packet Loss Significantly Reducedwith the Adaptive Control LoopPacket Loss Significantly Reducedwith the Adaptive Control Loop

Packet loss: Naive, Static, Adaptive, Ideal.Packet loss: Naive, Static, Adaptive, Ideal. Packet loss in excess of Ideal: Static, Adaptive.Packet loss in excess of Ideal: Static, Adaptive.

Adaptive load sharing saves on average 60% Adaptive load sharing saves on average 60%

of packets dropped in excess by the static load sharing.of packets dropped in excess by the static load sharing.

Page 21: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

21IntelIntel Research Research

Minimal Disruption Property Ensures Few Flow RemappingsMinimal Disruption Property Ensures Few Flow Remappings

Flows, per iteration: appearing, persitent and remapped.Flows, per iteration: appearing, persitent and remapped.

Adaptive control loop Adaptive control loop leads on average to:leads on average to:• less than 0.05% less than 0.05% of the of the appearing flowsappearing flows remapped per iteration;remapped per iteration;• less than 0.2% less than 0.2% of the of the persistent flowspersistent flows remapped per iteration.remapped per iteration.

Page 22: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

22IntelIntel Research Research

Applications and Implementation Applications and Implementation

Page 23: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

23IntelIntel Research Research

Extension to Prevent RemappingExtension to Prevent Remapping

Minimal disruption property – only a small amount of Minimal disruption property – only a small amount of flows require special treatmentflows require special treatment Which ones? Keep state?Which ones? Keep state? What treatment?What treatment?

Which onesWhich ones: during transient periods between two : during transient periods between two mappings after adaptation:mappings after adaptation: Compute Compute bothboth mappings (OLD and NEW) per each packet. mappings (OLD and NEW) per each packet. If mappings differ, apply special treatment.If mappings differ, apply special treatment.

TreatmentTreatment:: If new flow (SYN packet), insert a classifier rule that maps to the If new flow (SYN packet), insert a classifier rule that maps to the newnew

mapping.mapping. If existing flow, insert a rule that maps to the If existing flow, insert a rule that maps to the old old mapping.mapping.

Monitor and delete terminated flows.Monitor and delete terminated flows.

Page 24: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

24IntelIntel Research Research

Server Load Balancer on the IBM PowerNP with Zero RemappingsServer Load Balancer on the IBM PowerNP with Zero Remappings

IBM PowerNP

S1

S2

SM

Control Point

xNEW

x

Classifier

NewFlows

Old Flows

HNEW HOLD

HNEW = HOLD

xOLD

Hash Function

Rule hit

Rule miss

Incoming Packet

Multiple (M) Servers

CP sends directlyNew Flows Old Flows

Collision Flows

Redirectto CP

New or Old Flow?

IBM PowerNP

Page 25: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

25IntelIntel Research Research

Adaptive Load Sharing: SummaryAdaptive Load Sharing: Summary

• hash–based;hash–based;• minimum state information;minimum state information;• adaptive, yet minimum flow disruptions;adaptive, yet minimum flow disruptions;

• multiprocessor network node transformed multiprocessor network node transformed into a parallel computer;into a parallel computer;• wide scope of applications (server LB, NP wide scope of applications (server LB, NP dispatcher, distributed router, etc.).dispatcher, distributed router, etc.).

Page 26: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

26IntelIntel Research Research

Research: Further methods

Research: Further methods

Page 27: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

27IntelIntel Research Research

Adaptive lookup/classification on a multiprocessor systemAdaptive lookup/classification on a multiprocessor system Splitting a large lookup table/rule base into several Splitting a large lookup table/rule base into several smaller smaller

consecutiveconsecutive sub-tables/rule bases sub-tables/rule bases Each sub-table/rule base has a dedicated processor Each sub-table/rule base has a dedicated processor

(microengine)(microengine) Boundaries adaptively tuned according to load on processorsBoundaries adaptively tuned according to load on processors Sub-data structures adapting to the enforced traffic locality.Sub-data structures adapting to the enforced traffic locality.

0 232 - 1

Example: Lookup TableExample: Lookup Table

ME0 ME1 ME2 ME3

Example: Rule BaseExample: Rule Base

0

0

232 - 1

232 - 1

ME0

ME1ME2

ME3

Page 28: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

28IntelIntel Research Research

Adaptive lookup tableAdaptive lookup table Table entries typically organized in a tree structure;Table entries typically organized in a tree structure; Adapting to the traffic patterns – rebalancing the tree Adapting to the traffic patterns – rebalancing the tree

according to hits distribution when updating table;according to hits distribution when updating table; Problem: worst case – optimization.Problem: worst case – optimization.

00 10

101100

1001110010

00

10

101100

10011

10010

500 hits500 hits

20 hits20 hits

50 hits50 hits

70 hits70 hits

600 hits600 hits

MemAccesses: 2 . 20 + 2 . 70 + 3 . 50 + MemAccesses: 2 . 20 + 2 . 70 + 3 . 50 + 3 . 5003 . 500 = = 18301830

1 MA

2 MA

3 MA

Example:Example:

20 hits20 hits 70 hits70 hits

50 hits50 hits

600 hits600 hits

MemAccesses: MemAccesses: 1. 5001. 500 + 3. 20 + 3 . 70 + 4. 50 = + 3. 20 + 3 . 70 + 4. 50 = 970970

500 hits500 hits

Page 29: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

29IntelIntel Research Research

Dynamic code reconfigurationDynamic code reconfiguration Modular code, modules interconnected via virtual Modular code, modules interconnected via virtual

queuesqueues Counters periodically accounting for virtual queue Counters periodically accounting for virtual queue

occupancyoccupancy If imbalance on the codepathsIf imbalance on the codepaths

Restructure codeRestructure code Remap resourcesRemap resources

Example: IP vs. MPLS balanceExample: IP vs. MPLS balance

MPLS

IP

MPLS

IPMAC IP AQM

MPLS

Critical CodepathCritical Codepath

Page 30: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

30IntelIntel Research Research

Outlook: Adaptive methodsin networkingOutlook: Adaptive methodsin networking Improved performance, reduced power consumption;Improved performance, reduced power consumption; AdaptiveAdaptive instead of programmable networks! instead of programmable networks!

Feedback control rather than programming/debugging;Feedback control rather than programming/debugging; Execution self-adjusts based on monitored knowledge Execution self-adjusts based on monitored knowledge

Data path: Data path: program code, data structures, resource assingment;program code, data structures, resource assingment; Control path: Control path: routing, transport mechanisms;routing, transport mechanisms;

Set of primitives out of which to compose functionality, Set of primitives out of which to compose functionality, rather than a program.rather than a program.

Issues:Issues: Distributed feedback control; Distributed feedback control; Control algorithms on different layers do not interfere.Control algorithms on different layers do not interfere.

Page 31: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

IntelIntel Research Research

Q & AQ & A

The End - Thank You!The End - Thank You!

Page 32: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

32IntelIntel Research Research

BackupBackup

Page 33: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

33IntelIntel Research Research

Why Load Sharing at All? System Optimization!Why Load Sharing at All? System Optimization!

Advantages• Maximize total load, while respecting a packet loss constraint;• M/M/m queue > m * M/M/1 queues;• Fault tolerance;• Scalability.

?

No Load Sharing Load Sharing

Drawbacks• Increased system complexity due to:

• state information maintenance;• computing overhead.

Page 34: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

34IntelIntel Research Research

Proof of Concept - Simple SimulatorProof of Concept - Simple Simulator

• 8 outgoing links with various capacities, preceded by per-link queues;

• simple, generated traffic – random (uniform) identifier vector, uniform packet burst probability;

• HRW weights initially set to 0;

Results:

• weights values asymptotically tend to the correct ones;

• queue utilization soon close to the total system utilization;

• decrease in standard deviation of queue occupancy shows the influence of feedback control.

Page 35: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

35IntelIntel Research Research

Maximal Per-Flow Fractionof the Interface Rate f Significantly Influences Performance

Maximal Per-Flow Fractionof the Interface Rate f Significantly Influences Performance

Packets dropped and flows remapped, in dependence on f.

The more a single flow may consume from the interface rate, the worse the adaptive load sharing method performs, in both the packet loss and flow remappings response variables.

Page 36: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

36IntelIntel Research Research

Data Path in a Distributed Router

LineCards

1

4

CP

NPUs

2

3

4

5

N

3

1

2

Incoming Packet

packet fields contained in

the identifier vector v, v

w

v

additional packet fields contained in the

information vector w

w

1. Parse

2. f ( v ) = 3

4. Request (3, w )

5. NextHop ( w ) = N

6. GetPayload (1, v )7. Switch Packet (N)

3. StorePayload (1, v)

In

Out

Input & Output Switch / Shared Memory

Page 37: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

37IntelIntel Research Research

• max (A,B,C,D) = max (max (A,B), max (C,D))• adaptation on multiple tree levels• minimal flow disruption holds on the lowest level only!!!

• balance the tree - avoid nodes with few child nodes

• avoid adaptation on higher levels – looser threshold, wider hysteresis

• correlations among levels of hierarchy when computing g(v, j) – use offset

Scalable HRW WeightsData StructureScalable HRW WeightsData Structure

Page 38: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

38IntelIntel Research Research

Flow-to-Processor Mapping Bkp.

Def.: Flow-to-Processor mapping function f, f (v): V M :

f (v) = j xj . g ( v, j ) = maxk xk . g (v, k),

where v is the flow identifier vector, x = (x1, ... , xm) is a weights' vector

and g (v, j)(0,1) is a pseudorandom function of uniform distribution. The weights’ vector x is in a 1-to-1 relationship to p = (p1, ... , pm), the vector of traffic fractions received at each processor.

Highest Random Weight (HRW) Mapping, Thaler, Ravishankar, 1997, Ross, 1998, CARP Protocol.

0

x1

x2

1 2 3

x2 . g (v, 2)

x1 . g (v, 1)

Map to max of:

x3 x3 . g (v, 3)

Example: 3 processors

Page 39: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

39IntelIntel Research Research

HRW Mapping - How and Why Does It Work? – BKP

HRW Mapping - How and Why Does It Work? – BKP

xMAX

0

x1

x2

1 2 3

xMAX

0

x1

x2

1 2 3 4

x4

xMAX . g (v, 3)

x2 . g (v, 2)

x1 . g (v, 1)

xMAX . g (v, 3)

x2 . g (v, 2)

x1 . g (v, 1)

x4 . g (v, 4)

Max of:

Max of:

Page 40: Intel Research Adaptive Load Sharing for Multiprocessor Network Nodes Lukas Kencl Intel Research Cambridge UCL, November 12, 2003.

40IntelIntel Research Research

HRW Mapping Properties Examples – Bkp. HRW Mapping Properties Examples – Bkp.

x3 = xMAX

0

x1

x2

1 2 3

x3 . g (v, 3)

x2 . g (v, 2)

x1 . g (v, 1)

Max of:

Minimal disruption of mapping in case of processor addition (add 4th processor):

0 1 2 3

g (v, 3)

g (v, 1)

Max of:g (v, 2)

1

4

g (v, 4)

0 1 2 3

g (v, 3)

g (v, 1)

g (v, 4)

4

g (v, 2)

1 Max of:

Load balancing over heterogenous processors: weights’ vector x is in a 1-to-1 corres-

pondence to p = (p1, ... , pm), the vector of traffic fractions received at each processor:

Example: p3 + /2 + /3


Recommended