On the Power of Preprocessing and Reconfigurable NetworksKlaus-Tycho Foerster, University of Vienna, 20 December 2018
• Networks change slowly, why do we run distributed protocols from scratch?
◦ Use preprocessing for faster runtimes!
• Many Datacenter designs are hybrid in nature, but treat parts separately
◦ Use static and reconfigurable topology as a joint non-segregated resource
• Wide-area optical links can change bandwidth, but rates are usually fixed
- Let links become rate-adaptive, according to environmental conditions
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 2
Leveraging Unused Network Resources
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 3
IEEE/ACM ANCS 2018 ACM SIGCOMM 2018IEEE INFOCOM 2019
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 4
IEEE INFOCOM 2019 IEEE/ACM ANCS 2018 ACM SIGCOMM 2018
• Decentralization aids scalability
◦ But: Many problems are not “local” (e.g., coloring)
- Spanning tree, shortest path, minimizing congestion, good optimization algorithms
• Preprocessing helps scalability (e.g., breaking symmetries ahead of time)
◦ Unknown network state too strong assumption for many scenarios
◦ Often we just react to events, physical topology in wired networks does not grow suddenly
• Case study: Software-Defined Networking, single (logically centralized) controller does not scale
◦ Create many local controllers that can react quickly, that control small set of “dumb” nodes
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 5
Practical Motivation for Preprocessing
• Communication is often the limiting factor in distributed computing
• We want to solve problems fast (e.g., constant time, polylogarithmic time)
• More formally, we consider the well-studied LOCAL model
◦ Communication happens in synchronous rounds
◦ In each round, each of the n nodes can
- Send messages to its neighbors
- Receive messages from its neighbors
- Compute arbitrary functions
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 6
Focus of Our Study: Communication
We will assume unlimited computational power and storage, but won’t abuse this ☺
• Maybe the most fundamental problem in distributed computing
• Task: Color nodes s.t. neighboring nodes have different color
◦ Optimization: Use few colors!
• Common application: symmetry breaking
• Example many might know: choosing WiFi-channels to minimize interference
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 7
Example: Coloring
• 2-coloring:
◦ Needs Ω(n) rounds
• 3-coloring:
◦ Needs non-constant time
• Cannot improve in the LOCAL model
◦ Intuition: Picking a color affects nodes in super-constant distance
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 8
Coloring of rings (LOCAL model)
• 2-coloring:
◦ 0 rounds☺
• 3-coloring:
◦ 0 rounds☺
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 9
Coloring of rings (LOCAL model) – with Preprocessing
• How about a coloring of a subgraph?
• Local model: runtime does not change
• With preprocessing: fast!
◦ Coloring remains valid
- (but: might no longer be optimal!)
• What are further application scenarios?
• What else can we do with the SUPPORT of Preprocessing?
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 10
Coloring of rings (LOCAL model) – with Preprocessing & Subgraphs
• Extends the LOCAL model (w. unique IDs) with preprocessing
• Original structure given as the SUPPORT graph H=(V(H),E(H))
• Problem instance is a subgraph G=(V,E) of H
• Two phases:
1. Preprocessing: compute any function on H and store output locally
2. Solve problem on G in LOCAL model with preprocessed outputs
- Runtime: Number of t rounds in (2), denoted as SUPPORTED(t)
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 11
The SUPPORTED Model
G
H
Active variant: allow to communicate on support H
E.g. MAC-address
• Task: Leader election (Θ(diameter) runtime in LOCAL model)
◦ Easy if G=H: precompute leader, 0 rounds
◦ But for different G:
- We need to compute a leader for each connected component of G!
• Component has no leader? Re-elect
• Component has multiple leaders? Re-elect
• Components can have asymptotically same diameter
• SUPPORTED model does not provide a “silver bullet”
◦ Not even for the active variant
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 12
Does the SUPPORTED Model make everything easy?
• Let the support graph H be a complete graph
• What sort of meaningful information (for G) can we precompute?
◦ Upper bound on ID-space / network size…?
◦ Problem: G can be arbitrary
• For example, if a SUPPORTED algorithm has polylogarithmic runtime
◦ ∃ LOCAL algorithm with constant factor overhead
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 13
What about the General Case?
Idea: simulate that support graph H is a complete graph
In active model: Congested Clique(quite powerful, more later…)
• Real topologies are usually not complete graphs
• Case study: planar graphs
◦ Remain planar under edge deletions
◦ Are 4-colorable
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 14
But: Restricted Graph Families are Useful ☺
„Geloeste und ungeloeste Mathematische Probleme aus alter und neuer Zeit" by Heinrich Tietzehttp://www.math.harvard.edu/~knill/graphgeometry/faqg.html
• Task: Find subset D of nodes s.t. every node
◦ Has a neighbor in D or is in D
• Can we pre-compute?
◦ A bad one yes: everyone in D!
◦ But not an optimal one!
- Graph can look very different
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 15
Case Study: Dominating Set
• (1+δ)-approximation not possible in constant time [Czygrinow et al., DISC 2008]
◦ But maybe in the SUPPORTED model?
• Let‘s analyze their LOCAL algorithm:
◦ Find weight-appropriate pseudo-forest [constant time ☺]
◦ 3-color pseudo-forest [non-constant time ]
◦ Run clustering/optimization algorithms on components of constant size [constant time ☺]
• Also works for O(1)-genus graphs [extending work of Akhoondian Amiri et al., PODC 2016]
◦ Also for planar graphs for maximum independent set & maximum matching
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 16
Case Study: Minimum Dominating Set in Planar Graphs
Max out-degree of 1
SUPPORTED speed-up: 1) precompute 4-coloring 2) reduce 4-colored pseudo-forest to 3 colors in 2 rounds
[constant time SUPPORTED model ☺]
• What can we do with it? Next slide
• What is it? “Sequential LOCAL” model
◦ Simplified: Let nodes compute in arbitrary but sequential order
◦ May store and check outputs in distance t
◦ t is the (sequential) locality of a problem, denoted SLOCAL(t)
• Example: Maximal Independent Set
◦ Pick node set IS s.t. each node in IS may not have neighbors in IS
◦ Maximal: No more nodes can be added to IS
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 17
Case Study: The SLOCAL model [Ghaffari et al., STOC 2017]
• Connection to SLOCAL model [Ghaffari et al., STOC 2017]
◦ SLOCAL(t) can be simulated in SUPORTED(O(t∗poly log n)): e.g. MIS in SUPPORTED(poly log n)
- Converse not true, respectively open question
• Locally Checkable Labelings LCL:
◦ LCL in LOCAL(o(log n)) can be solved in O(1) in the SUPPORTED model
• Optimization problem: Maximum Independent Set, of size α(G)
◦ Set of size (α(G)-ε)n in O(log1+ε n), respectively (1+ε) approximation if maximum degree Δ constant
◦ Cannot be approximated by o(Δ/log Δ) in time o(logΔ n) in the active SUPPORTED model
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 18
Further Results in the Active SUPPORTED Model
Also works in passive model:SLOCAL(t) →SUPPORTED(ΔO(t))
(Δ is max node degree)
Also works without the active model
e.g. network size, restricted H, known inputs..
Use all edges of H for communication
Best LOCAL algorithm:
2𝑂( log 𝑛)
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 19
IEEE INFOCOM 2019 IEEE/ACM ANCS 2018 ACM SIGCOMM 2018
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 20
IEEE INFOCOM 2019 IEEE/ACM ANCS 2018 ACM SIGCOMM 2018
Page 21On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
Hybrid/Reconfigurable Data Center Networks (DCNs)
ProjecToR interconnectGhobadi et al., SIGCOMM ‘16
Helios (core)Farrington et al., SIGCOMM ‘10
c-Through (HyPaC architecture)Wang et al., SIGCOMM ‘10
Rotornet (rotor switches)Mellette et al., SIGCOMM ‘17
Solstice (architecture & scheduling)Liu et al., CoNEXT ‘15
REACToRLiu et al., NSDI ‘15
… and many more …
FireFlyHamedazimi et al., SIGCOMM ‘14
Page 22On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
• Results and conclusions often not portable
◦ Between topologies/technologies
• Assumption in routing takes away optimality
• We take a look from a theoretical perspective
◦ With average path length as an objective
◦ For one switch (with/without this assumption)
◦ Also briefly for multiple switches
Hybrid/Reconfigurable Data Center Networks (DCNs)
Page 23On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
The Static Case
A C E G
B D F
Communication frequency: A→E: 10, A→G: 5
Weighted average path length: 4*10+6*5=70
Page 24On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
Adding Reconfigurability
A C E G
B D F
Communication frequency: A→E: 10, A→G: 5
Weighted average path length: 4*10+6*5=70static
Weighted average path length: 1*10+6*5=40
reconfig
1*10+(1+2)*5=25
optimum
Page 25On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
Adding Reconfigurability
A C E G
B D F
Communication frequency: A→E: 10, A→G: 5
Weighted average path length: 4*10+6*5=70static
Weighted average path length: 1*10+6*5=40
reconfig
1*10+(1+2)*5=25
optimum
Page 26On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
• Especially important at scale: multiple reconfigurable switchesBeyond a Single Switch
A Tale of Two TopologiesXia et al., SIGCOMM ‘17
RotornetMellette et al., SIGCOMM ‘17
Page 27On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
• Model: Either just 1 reconfig or just staticOne Switch: Segregated Routing Policies
A C E G
B D F
Communication frequency: A→E: 10, A→G: 5
Why this solution?
Benefit of A→E: 10:• Static-Reconfig: 40-10=30
Benefit of A→G: 5:• Static-Reconfig: 30-5=25
Page 28On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
• Model: Either just 1 reconfig or just static
• Optimal solution in polynomial time:
1. Compute & assign benefit to every matching edge
2. Compute optimal weighted matching
- E.g., weighted Edmond’s Blossom algorithm
• Downside: Only optimal under (artificially!?) segregated routing policy!
◦ Not optimal under arbitrary routing policies
One Switch: Segregated Routing Policies
Page 29On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
One Switch: Non-Segregated Routing
Can improve routing quality
NP-hard to optimally compute
Already for simple settings (sparse communication patterns, unit weights etc.)
Approximation algorithms & restricted topologies Future Work
Already some work in different settings, e.g.:• network forms a dynamic tree [Schmid et al., ToN ‘16]• constant degree and sparse demands [Avin et al., DISC ‘17]• degree depends on node popularity [Avin et al., Inf. Pr. Let. ‘18](these works assume all links are reconfigurable)
Page 30On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
• Makes the setting more scalable ☺
• But of course, still NP-hard (already for one switch)
• Let’s make things simpler
Multiple Reconfigurable Switches
Page 31On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
• Can we optimize max. path length?
◦ For 2 flows?
- NP-hard again
Multiple Switches: More than One Flow
Page 32On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
Multiple Switches: One Flow
Page 33On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
• Consider weightsMultiple Switches: One Flow
A C E G
B D F
Communication frequency: A→G: 1
5 5 5 51 1
10 10 10
10
1 1
How to formalize?
• Challenge:
◦ Proper matchings
◦ Polynomial algorithm
• Idea: Use flow algorithms
◦ Min-cost integral flow is polynomial
Multiple Switches: One Flow
Page 34On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
Acapacity =1
*some small strings attached
Unidirectionality
• Same conceptual idea
A
Aout
Ain
Page 35On the Power of Preprocessing and Reconfigurable Networks @UESTC, China20/12/2018
• one reconfigurable switch
◦ segregated: Easy. Not optimal.
◦ not seg.: NP-hard. Improves solutions.
• multiple reconfigurable switches
◦ multiple flows: NP-hard
◦ just one flow: Easy.
• next steps
◦ approximation algorithms, special topologies
Summary and Outlook
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 36
IEEE INFOCOM 2019 IEEE/ACM ANCS 2018 ACM SIGCOMM 2018
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 37
IEEE INFOCOM 2019 IEEE/ACM ANCS 2018 ACM SIGCOMM 2018
A big thank you to Rachee for supporting
me with her slides!
Wide Area Networks
38
Costs O(100) million dollars per year
O(100) datacenters
Dedicated Wide Area Network
[SIGCOMM ’13]
[SIGCOMM ’14]
[SIGCOMM ’16]
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
39
O(100,000 miles) of fiber
O(1,000) optical devices
Fiber is scarce, expensive
Identify inefficiencies in the optical backbone to gain
capacity, availability at reduced cost.
Gain 134 Tbps of capacity and prevent 25%link failures in large North American WAN.
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China 40
Outline
• How inefficient are optical backbones?
• Dynamic capacity links in WANs
• Challenges in dynamically adapting link capacities
• Rate Adaptive WANs
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China 41
Optical Backbone Networks
Optical cross-connects (OXCs)• OXC: switches optical signals
• Signal-to-noise ratio (SNR) measures signal quality
• At OXC, measure signal quality• 8,000 wavelengths
• Every 15 minutes
• February 2015 to June 2017
fiber
4220/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
Longitudinal Signal Quality on Fiber
43
Hig
her
is b
etter
100 Gbps
75 Gbps
150 Gbps
175 Gbps
200 Gbps
125 Gbps
50 Gbps
Failure SNR
Capacity Threshold
01-07-2017
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
Opportunity for capacity gain
0.00
0.25
0.50
0.75
1.00
0 2 4 6 8 10 12 14 16Average SNR
CD
F
For 8,000 wavelengths in WAN:
• Analyze average SNR
• Compare with thresholds for link capacity
64% of optical wavelengths can operate at 175 Gbps or
more. 95% of optical wavelengths
can operate at higher than 100 Gbps.
44
(dB)
10
0 G
bp
s
12
5 G
bp
s
15
0 G
bp
s
17
5 G
bp
s
20
0 G
bp
s
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
Opportunity for availability gain
• Distribution of link failure SNR
• Across WAN links
• For 2.5 years
25% of failures have SNR > 2.5dB
45
(dB)
These failures can be prevented
by reducing link capacity to 50
Gbps
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
Our proposal
• Dynamically adapt link capacities in response to changes in SNR.
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China 46
Gain 134 Tbps
capacity
By increasing
link capacity
when high SNR
Prevent 25%
link failures
By reducing link
capacity when
low SNR
Outline
• How inefficient are optical backbones?
• Dynamic capacity links in WANs
• Challenges in dynamically adapting link capacities
• Rate Adaptive WANs
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China 47
Challenges in dynamically adapting link capacities
• Requires hardware support for capacity reconfiguration
• Requires re-thinking IP layer traffic engineering
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China 48
Can we use commodity hardware for changing link capacities?
49
Bandwidth Variable
Transceiver
Arista 7504 linecards
Key question
Supports higher order modulations
(QPSK, 8-QAM, 16-QAM)
Link capacity of 100G, 150G, 200G
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
Arista 7504
Chassis
Challenge 1: Adapting capacity on commodity h/w
Increasing noise from attenuator
Capacity Downgrade to 150G Capacity Downgrade to 100G
50
Ethernet 3/1/1
Ethernet 4/1/1
Variable Optical Attenuator
200G
Link
Down
150G
Link
Down
Takes over 1 minute to change
capacity → link downtime
100G
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
Commodity hardware is not optimized for dynamically adapting link capacity.
51
Problem
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
52
Question
Link Usable
Link not usable
Link Usable
Turn off laser Program Registers
Turn laser on
Laser is on
What causes latency of capacity reconfiguration?
Majority of time spent in turning laser on.
1 minute
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China 53
Can we reduce the latency of capacity reconfiguration by not turning off the laser?
Question
54
Can we reduce the latency of capacity reconfiguration by not turning off the laser?
Question
Acacia BVT Evaluation
Board
Do not turn off laser in the evaluation board
Program registers for modulation change
If the laser is left on, the outage is only 35ms to change capacity
Repeat experiment 200X
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China 55
How should traffic engineering incorporate dynamic capacity links?
Question
How should traffic engineering incorporate dynamic capacity links?
56
Question
Capacity changes cause links to be unavailable for carrying traffic.
Capacity changes lead to network churn and can be disruptive.
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
Outline
• How inefficient are optical backbones?
• Dynamic capacity links in WANs
• Challenges in dynamically adapting link capacities
• Rate Adaptive WANs
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China 57
We design the Rate Adaptive Wide Area Network (RADWAN) traffic engineering controller.
58
Solution
SNR-aware
Knows possible
capacity gain of
each link
Minimally
disruptive
Reconfigure
capacity while
minimizing
network churn
Rate Adaptive
Adapts link rates
to meet demands
and improve
availability
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
RADWAN Traffic Engineering Formulation
59
Network Topology
Flow AllocationsDemand MatrixOptimization
Objective
Inputs Outputs
Constraints
Optical Topology and SNR
Current FlowAllocation
Links to reconfigure
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
Proof of concept: RADWAN
60
A
C D
B
39
0 k
m
37
5 k
m
410 km
365 km
Router
Amplifier
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
Throughput Gains with RADWAN
61
SWAN [SIGCOMM ‘13]
SWAN-150
RADWAN
RADWAN-hitless
(Gb
ps)
RADWAN has 40%
Higher network throughput
compared to SWAN
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
Conclusion• Physical layer today is configured statically
• We show that this leaves money on the table, in terms of
◦ Network performance capacity
◦ Link availability
◦ Equipment cost ($/Gbps)
• RADWAN introduces programmability in Layer 1
◦ Improves network throughput by 40%
◦ Reduces link downtime by a factor of 18
◦ Reduces equipment cost ($/Gbps) by 32%
6220/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 63
IEEE INFOCOM 2019 IEEE/ACM ANCS 2018 ACM SIGCOMM 2018
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 64
Interested in an Overview on Algorithmic Problems in Reconfigurable Networks?
To appear in ACM SIGCOMM Computer Communication ReviewPreprint available at https://www.univie.ac.at/ct/stefan/
https://www.univie.ac.at/ct/stefan/
• Links need to be repaired from time to time
• Can repair procedures be predicted?
• Which order to keep network capacitated?
• Next: What to do if links fail unexpectedly?
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 65
Another area of our interest: Link failures
ACM SIGCOMM 2017
• No re-convergence
• No header changes
• Only local information in routing table
• Idea: Compute arc-disjoint spanning trees
• Switch trees when hitting failure
• Resilience: depends on number of trees
• Quality: depends on tree computation
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 66
Precompute Alternative Routes
ACM SIGCOMM Computer Communication Review 2018
IEEE INFOCOM 2019
• Alternative method:
◦ Use Segment Routing
◦ Available in IPv6
◦ Idea: Push segments on destination stack
• This paper: carry failures as well
◦ Re-compute paths/match in routing table
◦ Downside: slow/large tables
◦ Improvement next ☺
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 67
Failure Carrying in Segment Routing
IEEE Global Internet Symposium 2018
• Precompute Segments
• Match only on local failures
• Protect links, not destinations
• This paper:
◦ General MIP formulation
◦ Polynomial scheme on Hypercubes
◦ First performance experiments
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 68
Preprocessing for Segment Routing
OPODIS 2018
Thank you very much for the invitation ☺
20/12/2018 On the Power of Preprocessing and Reconfigurable Networks @UESTC, China Page 69
On the Power of Preprocessing and Reconfigurable NetworksKlaus-Tycho Foerster, University of Vienna, 20 December 2018