Date post: | 12-Jan-2016 |
Category: |
Documents |
Upload: | noel-robbins |
View: | 222 times |
Download: | 0 times |
IntelIntel Research Research
Adaptive Load Sharingfor MultiprocessorNetwork Nodes
Adaptive Load Sharingfor MultiprocessorNetwork Nodes
Lukas KenclLukas Kencl
Intel Research CambridgeIntel Research Cambridge
UCL, November 12, 2003UCL, November 12, 2003
2IntelIntel Research Research
OutlineOutline Adaptive load sharing methodAdaptive load sharing method
FlowFlow-to-processor mapping -to-processor mapping Adaptation with Adaptation with minimal disruptionminimal disruption Method validation and applicationMethod validation and application
Research: Further methodsResearch: Further methods Adaptive data structuresAdaptive data structures Dynamic code reconfigurationDynamic code reconfiguration
Outlook: Adaptive methods in networkingOutlook: Adaptive methods in networking
3IntelIntel Research Research
Adaptive Load Sharing for Adaptive Load Sharing for Multiprocessor Network NodesMultiprocessor Network NodesPh.D. Work: IBM Zurich Research & EPFL, LausannePh.D. Work: IBM Zurich Research & EPFL, Lausanne
2 parts:2 parts: Flow-to-processor mappingFlow-to-processor mapping Adaptation with minimal disruptionAdaptation with minimal disruption
4IntelIntel Research Research
Multiprocessor Network NodeMultiprocessor Network NodeAs a Load Sharing SystemAs a Load Sharing System
Assumptions:Assumptions:• Data arrives in Data arrives in packetized flows.packetized flows.
• Any processor can Any processor can process any packet.process any packet.
Heterogenous Heterogenous processor capacityprocessor capacity jj..
Incoming PacketsIncoming PacketsIncoming PacketsIncoming Packets
Multiple (M)Multiple (M)
ProcessorsProcessors
11
22
33
MM
Multiple (N)Multiple (N)
InputsInputs
11
22
33
44
NN
Task:Task:• Load on processors within some measure of balance.Load on processors within some measure of balance.• Same flow to same processor (reordering, context).Same flow to same processor (reordering, context).
Advantage:Advantage: system optimization. system optimization. Drawback:Drawback: complexity, overhead. complexity, overhead.
5IntelIntel Research Research
Acceptable Load Sharing Acceptable Load Sharing as the Measure of Balanceas the Measure of Balance
Processing load on processor Processing load on processor jj j j (t)(t)
Capacity on processor Capacity on processor jj jj
Workload intensity on processor Workload intensity on processor jj jj(t)(t) = = jj(t)(t) / / jj
Total system workload intensityTotal system workload intensity (t)(t) = = jj(t)(t) / / jj
““No single processorNo single processor is overutilized if the is overutilized if the system in totalsystem in total is not overutilized, is not overutilized,
and vice versa.”and vice versa.”
Acceptable load sharing:Acceptable load sharing:
ifif (t)(t) 1 1 thenthen jj, , jj(t)(t) 1,1,
if if (t)(t) > >11 then then jj, , jj(t)(t) >>1.1.
Acceptable load sharing minimizes packet loss probability!Acceptable load sharing minimizes packet loss probability!
6IntelIntel Research Research
Minimizing Disruption Minimizing Disruption Goal:Goal: Acceptable load sharing Acceptable load sharing
withoutwithout maintaining maintaining flow stateflow state information information
and yet and yet minimizingminimizing the probability of the probability of mapping disruptionmapping disruption (flow remapping). (flow remapping).
NPNP-complete problem-complete problem (Integer Linear Programming): (Integer Linear Programming):
maxmax vv vv(t) . (t) . jj (1{ f(t- (1{ f(t-t)(v) = j} .t)(v) = j} .1{ f(t)(v)= j1{ f(t)(v)= j}),}),
whilewhile vv a avv(t) l(v) .(t) l(v) . 1{ f(t)(v)= j} 1{ f(t)(v)= j} = = jj (t) (t) jj , , j.j.
v v - - flow identifier vector in the packet header,flow identifier vector in the packet header,
ff(t)(t)(v)(v) - - function mapping flows to processors, changing over time,function mapping flows to processors, changing over time,
vv(t)(t){0,1}{0,1} - - indicator ifindicator if vv has appeared in the intervalshas appeared in the intervals (t-2(t-2 t, t- t, t- t) t) and and (t-(t- t, t) t, t),,
aavv(t)(t) - - how many times hashow many times has vv appeared in the intervalappeared in the interval (t-(t- t, t), t, t),
l(v)l(v) - - load per packet carryingload per packet carrying v,v,
tt - - iteration intervaliteration interval. .
Even if we knew all the flow state information, an Even if we knew all the flow state information, an NPNP-complete problem – -complete problem – heuristicsheuristics..
7IntelIntel Research Research
Flow-to-Processor Mapping Flow-to-Processor Mapping Multiple (N)Multiple (N)
InputsInputs
11
22
33
NN
Multiple (M) Multiple (M)
ProcessorsProcessors
11
22
33
44
MM
Incoming PacketIncoming Packet
flow flow identifier identifier vectorvector vv
v v
Weights vector Weights vector x x = (x= (x11 , ..., x, ..., xM M ).).
Upon packet arrival, a decision is made where to process the packet, based Upon packet arrival, a decision is made where to process the packet, based on the flow identifieron the flow identifier and a set of weights and a set of weights. A . A flow-to-processor mapping flow-to-processor mapping ff is thus established.is thus established.
f ( f ( vv ) = ) = 33
8IntelIntel Research Research
Flow-to-Processor MappingFlow-to-Processor Mapping
Def.: Flow-to-Processor mapping function f, f (v): V M :
f (v) = j xj . g ( v, j ) = maxk xk . g (v, k),
where v is the flow identifier vector, x = (x1, ... , xm) is a weights' vector
and g (v, j)(0,1) is a pseudorandom function of uniform distribution.
Highest Random Weight (HRW) Mapping, Thaler, Ravishankar, 1997, Ross, 1998, CARP Protocol, Windows NT LB
0 1 2 3
g (v, 3)
g (v, 1)
Map to max of:
g (v, 2)
Example: 3 processors with homogenous processing capacity (weight xi = 1, i).1
Map to proc. 2.
9IntelIntel Research Research
HRW Mapping Favourable Properties HRW Mapping Favourable Properties Minimal disruption of mapping in case of processor addition. Example: Add processor No. 4, vectors mapped either (i) as before addition or (ii) to the newly added processor – minimal number of vectors change mapping.
0 1 2 3
g (v, 3)
g (v, 1)
Max of:g (v, 2)
1
4
g (v, 4)
0 1 2 3
g (v, 3)
g (v, 1)
g (v, 4)
4
g (v, 2)
1 Max of:
Load balancing over heterogenous processors: weights’ vector x is in a 1-to-1 correspondence to p = (p1, ... , pm), the vector of traffic fractions received at each processor. Pseudorandom function g (v, j)(0,1) can be implemented as a fast-computable hash function.
10IntelIntel Research Research
4. Download new4. Download new
x := xx := x(t).(t).
2. Evaluate 2. Evaluate
(t)(t) = ( = (1 1 (t)(t), , 2 2 (t)(t), ... , , ... , m m (t)(t)))
(compare threshold).(compare threshold).
3. Compute new3. Compute new
x x (t)(t) = (x = (x1 1 (t)(t), ..., x, ..., xm m (t)(t)).).
Adaptation through FeedbackAdaptation through Feedback
Trigger definition targets Trigger definition targets preventing overload, preventing overload, if system in if system in total not overloaded, and vice total not overloaded, and vice versa.versa.
A A threshold threshold triggers adaptation triggers adaptation when close to load sharing when close to load sharing bounds. bounds.
Flow-to-processor mapping Flow-to-processor mapping ff becomes a becomes a function of timefunction of time f(t)f(t)((vv))
Adaptation may cause Adaptation may cause flowflow remapping! remapping! How to minimize the How to minimize the amount remapped?amount remapped?
Multiple (N)Multiple (N)
Input CardsInput Cards
11
22
33
NN
Multiple (M)Multiple (M)
processorsprocessors
11
22
33
44
MM
CPCP
Control PointControl Point
1. Filtered workload 1. Filtered workload
intensity ( j) =intensity ( j) = j j(t).(t).
Problem:Problem: incoming requests incoming requests are packets, not flows! Packets not evenly distributed over flows -are packets, not flows! Packets not evenly distributed over flows ->> not evenly distributed over the request object spacenot evenly distributed over the request object space -> HRW mapping not sufficient for -> HRW mapping not sufficient for acceptable load sharing boundsacceptable load sharing bounds –> need to–> need to adapt!adapt!
11IntelIntel Research Research
Adaptation AlgorithmAdaptation Algorithm
StartStart
TriggerTrigger
AdaptationAdaptation??
Wait time Wait time tt
Adapt Adapt
weights' vector weights' vector xx and uploadand upload
Triggering PolicyTriggering Policy Adaptation PolicyAdaptation Policy
YesYesNoNo
Compute filtered Compute filtered processor workload processor workload
intensity intensity (t)(t)
12IntelIntel Research Research
TriggeringTriggering PolicyPolicy
Dynamic workload intensityDynamic workload intensity
thresholdthreshold
'' (t) = 1/2 (1+(t) = 1/2 (1+(t))(t))
Triggering policyTriggering policy
(i)(i) if if (t) (t) 1 1 andand maxmax jj (t) > (t) > (t) (t) then then adaptadapt;;
(ii) if(ii) if (t) > 1(t) > 1 andand minmin jj (t)(t) < < (t) (t) then then adapt.adapt.
ExampleExample::
(t) = (0.8, 0.2, 0.2), (t) = (0.8, 0.2, 0.2), (t) = 0.4(t) = 0.4
(t) = 0.7,(t) = 0.7,
11 (t)(t) >> (t)(t) adapt.adapt.
Triggering thresholdTriggering threshold
(t)(t) = = maxmax(('' (t), upper)(t), upper)
or vice versaor vice versa
Hysteresis boundHysteresis bound
upper: (1+upper: (1+ HH(t)) .(t)) . (t)(t)
lower: (1 -lower: (1 - HH(t)) .(t)) . (t)(t)
13IntelIntel Research Research
Adaptation Policy: MinimalAdaptation Policy: Minimal Disruption Example (3 Proc.)Disruption Example (3 Proc.)
• reduced receives less, unaltered receives more, if reduction by a single, invariable multiplier.• minimal disruption of the mapping.
0
x1
x2
1 2 3
x2 . g (v, 2)
x1 . g (v, 1)
Max of:
1 2 3
x3
0
2/3.x1
2/3.x2
x3 . g (v, 3)
2/3 . x2 . g (v, 2)
2/3 . x1 . g (v, 1)
Max of:
x3 x3 . g (v, 3)
14IntelIntel Research Research
Adaptation Policy:Adaptation Policy:Minimal DisruptionMinimal Disruption
• A, B A, B - - mutually exclusive subsets ofmutually exclusive subsets of M={1,...,m}M={1,...,m}, , M=AM=A B.B.• (0, 1).(0, 1).• f, f'f, f' – two HRW mappings with the weights' vectors– two HRW mappings with the weights' vectors xx, , x'x'::
x'x'jj = = . x . xjj,, jj A,A,
x'x'jj = x = xjj,, jj BB..
• ppjj , , p'p'jj - fraction of objects mapped to node- fraction of objects mapped to node j j usingusing ff, , f'f'..
1) 1) p'p'jj ppjj , , jj AA,,
p'p'jj ppjj , j , j BB..
2) Fraction of objects mapped to a different node by each mapping2) Fraction of objects mapped to a different node by each mapping
isis MINIMAL, MINIMAL, that is, equal tothat is, equal to | | p'p'jj - p- pj j | . |V | | . |V | at every nodeat every node j. j.
15IntelIntel Research Research
Adaptation PolicyAdaptation Policy
Let Let (t)(t) 11.. Then:Then:
xxj j (t) : = (t) : = c(t)c(t) . x . xj j (t-(t-t) ,t) , ifif jj (t)(t) >> (t) (t) (( j j exceeds threshold exceeds threshold (t)(t))),,
xxj j (t) : = x(t) : = xj j (t-(t-t),t), ifif jj (t)(t) (t) (t) (( j j does not exceed threshold does not exceed threshold (t)(t)))..
IfIf (t) > 1(t) > 1,, the adaptation is carried out in a symmetrical manner.the adaptation is carried out in a symmetrical manner.
The weights' The weights' multiplier coefficient multiplier coefficient c(t)c(t) : :
( )( ) (t)(t)
minmin {{ jj(t)(t) | | jj(t)(t) >> (t)(t)}}c(t) =c(t) =
1/m1/m
FactorFactor c(t)c(t) is proportional to the minimal error and to the number of nodes. is proportional to the minimal error and to the number of nodes.
16IntelIntel Research Research
ValidationValidation
17IntelIntel Research Research
ExpectationsExpectations
• Workload intensity on individual processors close Workload intensity on individual processors close to that of the system in total to that of the system in total (acceptable load (acceptable load sharing);sharing);• Packet loss probability lowered Packet loss probability lowered (acceptable load (acceptable load sharing);sharing);• Persistent flows Persistent flows (appearing in two consecutive iterations)(appearing in two consecutive iterations) seldom remapped seldom remapped (minimize disruption).(minimize disruption).
18IntelIntel Research Research
“Realistic” Generated Traffic and Router System Model“Realistic” Generated Traffic and Router System Model
Measured flow length cumulative distribution
Traffic characterization (approximated from various published OC-3 statistics):
• Number of packets per time interval;
• Number of flows per time interval;
• Measured flow length distribution, complemented by Pareto to generate the heavy tail;
• Identifier vector distribution;
• Per-packet processing load distribution;
• Maximal per-flow fraction of interface rate f .Router system model:• 8 processors, 13 interfaces;
• System workload intensity close to 1;
• 3 alternatives to load sharing (LS):
• Naive (no LS)
• Static (LS with static weights)
• Adaptive LS.
19IntelIntel Research Research
Adaptation Keeps Per-processor Workload Intensity Close to IdealAdaptation Keeps Per-processor Workload Intensity Close to Ideal
Naive, no LS.Naive, no LS.
Static LS.Static LS.
Adaptive LS.Adaptive LS.
Max and min of all.Max and min of all.
20IntelIntel Research Research
Packet Loss Significantly Reducedwith the Adaptive Control LoopPacket Loss Significantly Reducedwith the Adaptive Control Loop
Packet loss: Naive, Static, Adaptive, Ideal.Packet loss: Naive, Static, Adaptive, Ideal. Packet loss in excess of Ideal: Static, Adaptive.Packet loss in excess of Ideal: Static, Adaptive.
Adaptive load sharing saves on average 60% Adaptive load sharing saves on average 60%
of packets dropped in excess by the static load sharing.of packets dropped in excess by the static load sharing.
21IntelIntel Research Research
Minimal Disruption Property Ensures Few Flow RemappingsMinimal Disruption Property Ensures Few Flow Remappings
Flows, per iteration: appearing, persitent and remapped.Flows, per iteration: appearing, persitent and remapped.
Adaptive control loop Adaptive control loop leads on average to:leads on average to:• less than 0.05% less than 0.05% of the of the appearing flowsappearing flows remapped per iteration;remapped per iteration;• less than 0.2% less than 0.2% of the of the persistent flowspersistent flows remapped per iteration.remapped per iteration.
22IntelIntel Research Research
Applications and Implementation Applications and Implementation
23IntelIntel Research Research
Extension to Prevent RemappingExtension to Prevent Remapping
Minimal disruption property – only a small amount of Minimal disruption property – only a small amount of flows require special treatmentflows require special treatment Which ones? Keep state?Which ones? Keep state? What treatment?What treatment?
Which onesWhich ones: during transient periods between two : during transient periods between two mappings after adaptation:mappings after adaptation: Compute Compute bothboth mappings (OLD and NEW) per each packet. mappings (OLD and NEW) per each packet. If mappings differ, apply special treatment.If mappings differ, apply special treatment.
TreatmentTreatment:: If new flow (SYN packet), insert a classifier rule that maps to the If new flow (SYN packet), insert a classifier rule that maps to the newnew
mapping.mapping. If existing flow, insert a rule that maps to the If existing flow, insert a rule that maps to the old old mapping.mapping.
Monitor and delete terminated flows.Monitor and delete terminated flows.
24IntelIntel Research Research
Server Load Balancer on the IBM PowerNP with Zero RemappingsServer Load Balancer on the IBM PowerNP with Zero Remappings
IBM PowerNP
S1
S2
SM
Control Point
xNEW
x
Classifier
NewFlows
Old Flows
HNEW HOLD
HNEW = HOLD
xOLD
Hash Function
Rule hit
Rule miss
Incoming Packet
Multiple (M) Servers
CP sends directlyNew Flows Old Flows
Collision Flows
Redirectto CP
New or Old Flow?
IBM PowerNP
25IntelIntel Research Research
Adaptive Load Sharing: SummaryAdaptive Load Sharing: Summary
• hash–based;hash–based;• minimum state information;minimum state information;• adaptive, yet minimum flow disruptions;adaptive, yet minimum flow disruptions;
• multiprocessor network node transformed multiprocessor network node transformed into a parallel computer;into a parallel computer;• wide scope of applications (server LB, NP wide scope of applications (server LB, NP dispatcher, distributed router, etc.).dispatcher, distributed router, etc.).
26IntelIntel Research Research
Research: Further methods
Research: Further methods
27IntelIntel Research Research
Adaptive lookup/classification on a multiprocessor systemAdaptive lookup/classification on a multiprocessor system Splitting a large lookup table/rule base into several Splitting a large lookup table/rule base into several smaller smaller
consecutiveconsecutive sub-tables/rule bases sub-tables/rule bases Each sub-table/rule base has a dedicated processor Each sub-table/rule base has a dedicated processor
(microengine)(microengine) Boundaries adaptively tuned according to load on processorsBoundaries adaptively tuned according to load on processors Sub-data structures adapting to the enforced traffic locality.Sub-data structures adapting to the enforced traffic locality.
0 232 - 1
Example: Lookup TableExample: Lookup Table
ME0 ME1 ME2 ME3
Example: Rule BaseExample: Rule Base
0
0
232 - 1
232 - 1
ME0
ME1ME2
ME3
28IntelIntel Research Research
Adaptive lookup tableAdaptive lookup table Table entries typically organized in a tree structure;Table entries typically organized in a tree structure; Adapting to the traffic patterns – rebalancing the tree Adapting to the traffic patterns – rebalancing the tree
according to hits distribution when updating table;according to hits distribution when updating table; Problem: worst case – optimization.Problem: worst case – optimization.
00 10
101100
1001110010
00
10
101100
10011
10010
500 hits500 hits
20 hits20 hits
50 hits50 hits
70 hits70 hits
600 hits600 hits
MemAccesses: 2 . 20 + 2 . 70 + 3 . 50 + MemAccesses: 2 . 20 + 2 . 70 + 3 . 50 + 3 . 5003 . 500 = = 18301830
1 MA
2 MA
3 MA
Example:Example:
20 hits20 hits 70 hits70 hits
50 hits50 hits
600 hits600 hits
MemAccesses: MemAccesses: 1. 5001. 500 + 3. 20 + 3 . 70 + 4. 50 = + 3. 20 + 3 . 70 + 4. 50 = 970970
500 hits500 hits
29IntelIntel Research Research
Dynamic code reconfigurationDynamic code reconfiguration Modular code, modules interconnected via virtual Modular code, modules interconnected via virtual
queuesqueues Counters periodically accounting for virtual queue Counters periodically accounting for virtual queue
occupancyoccupancy If imbalance on the codepathsIf imbalance on the codepaths
Restructure codeRestructure code Remap resourcesRemap resources
Example: IP vs. MPLS balanceExample: IP vs. MPLS balance
MPLS
IP
MPLS
IPMAC IP AQM
MPLS
Critical CodepathCritical Codepath
30IntelIntel Research Research
Outlook: Adaptive methodsin networkingOutlook: Adaptive methodsin networking Improved performance, reduced power consumption;Improved performance, reduced power consumption; AdaptiveAdaptive instead of programmable networks! instead of programmable networks!
Feedback control rather than programming/debugging;Feedback control rather than programming/debugging; Execution self-adjusts based on monitored knowledge Execution self-adjusts based on monitored knowledge
Data path: Data path: program code, data structures, resource assingment;program code, data structures, resource assingment; Control path: Control path: routing, transport mechanisms;routing, transport mechanisms;
Set of primitives out of which to compose functionality, Set of primitives out of which to compose functionality, rather than a program.rather than a program.
Issues:Issues: Distributed feedback control; Distributed feedback control; Control algorithms on different layers do not interfere.Control algorithms on different layers do not interfere.
IntelIntel Research Research
Q & AQ & A
The End - Thank You!The End - Thank You!
32IntelIntel Research Research
BackupBackup
33IntelIntel Research Research
Why Load Sharing at All? System Optimization!Why Load Sharing at All? System Optimization!
Advantages• Maximize total load, while respecting a packet loss constraint;• M/M/m queue > m * M/M/1 queues;• Fault tolerance;• Scalability.
?
No Load Sharing Load Sharing
Drawbacks• Increased system complexity due to:
• state information maintenance;• computing overhead.
34IntelIntel Research Research
Proof of Concept - Simple SimulatorProof of Concept - Simple Simulator
• 8 outgoing links with various capacities, preceded by per-link queues;
• simple, generated traffic – random (uniform) identifier vector, uniform packet burst probability;
• HRW weights initially set to 0;
Results:
• weights values asymptotically tend to the correct ones;
• queue utilization soon close to the total system utilization;
• decrease in standard deviation of queue occupancy shows the influence of feedback control.
35IntelIntel Research Research
Maximal Per-Flow Fractionof the Interface Rate f Significantly Influences Performance
Maximal Per-Flow Fractionof the Interface Rate f Significantly Influences Performance
Packets dropped and flows remapped, in dependence on f.
The more a single flow may consume from the interface rate, the worse the adaptive load sharing method performs, in both the packet loss and flow remappings response variables.
36IntelIntel Research Research
Data Path in a Distributed Router
LineCards
1
4
CP
NPUs
2
3
4
5
N
3
1
2
Incoming Packet
packet fields contained in
the identifier vector v, v
w
v
additional packet fields contained in the
information vector w
w
1. Parse
2. f ( v ) = 3
4. Request (3, w )
5. NextHop ( w ) = N
6. GetPayload (1, v )7. Switch Packet (N)
3. StorePayload (1, v)
In
Out
Input & Output Switch / Shared Memory
37IntelIntel Research Research
• max (A,B,C,D) = max (max (A,B), max (C,D))• adaptation on multiple tree levels• minimal flow disruption holds on the lowest level only!!!
• balance the tree - avoid nodes with few child nodes
• avoid adaptation on higher levels – looser threshold, wider hysteresis
• correlations among levels of hierarchy when computing g(v, j) – use offset
Scalable HRW WeightsData StructureScalable HRW WeightsData Structure
38IntelIntel Research Research
Flow-to-Processor Mapping Bkp.
Def.: Flow-to-Processor mapping function f, f (v): V M :
f (v) = j xj . g ( v, j ) = maxk xk . g (v, k),
where v is the flow identifier vector, x = (x1, ... , xm) is a weights' vector
and g (v, j)(0,1) is a pseudorandom function of uniform distribution. The weights’ vector x is in a 1-to-1 relationship to p = (p1, ... , pm), the vector of traffic fractions received at each processor.
Highest Random Weight (HRW) Mapping, Thaler, Ravishankar, 1997, Ross, 1998, CARP Protocol.
0
x1
x2
1 2 3
x2 . g (v, 2)
x1 . g (v, 1)
Map to max of:
x3 x3 . g (v, 3)
Example: 3 processors
39IntelIntel Research Research
HRW Mapping - How and Why Does It Work? – BKP
HRW Mapping - How and Why Does It Work? – BKP
xMAX
0
x1
x2
1 2 3
xMAX
0
x1
x2
1 2 3 4
x4
xMAX . g (v, 3)
x2 . g (v, 2)
x1 . g (v, 1)
xMAX . g (v, 3)
x2 . g (v, 2)
x1 . g (v, 1)
x4 . g (v, 4)
Max of:
Max of:
40IntelIntel Research Research
HRW Mapping Properties Examples – Bkp. HRW Mapping Properties Examples – Bkp.
x3 = xMAX
0
x1
x2
1 2 3
x3 . g (v, 3)
x2 . g (v, 2)
x1 . g (v, 1)
Max of:
Minimal disruption of mapping in case of processor addition (add 4th processor):
0 1 2 3
g (v, 3)
g (v, 1)
Max of:g (v, 2)
1
4
g (v, 4)
0 1 2 3
g (v, 3)
g (v, 1)
g (v, 4)
4
g (v, 2)
1 Max of:
Load balancing over heterogenous processors: weights’ vector x is in a 1-to-1 corres-
pondence to p = (p1, ... , pm), the vector of traffic fractions received at each processor:
Example: p3 + /2 + /3