1
Network Control and Managementin the 100x100 Architecture
222
The Role of Network Control and Management
Many different network environments
Access, backbone networks
Data-center networks, enterprise/campus
Many different technologies
Longest-prefix routing, label switching, circuit switching
IP, MPLS, ATM, optical circuits
Many different policies
Routing, reachability, transit, traffic engineering, robustness
The control plane software binds these elements together and defines the network
333
Control Plane: The Key Leverage Point
Great Potential: control plane determines the behavior of the network
Reaction to events, reachability, services
Great Opportunities
Each network (administrative domain) has its own control plane
A radical clean-slate control plane can be deployed
– Agnostic to user data format: IPv4/v6, ethernet, circuit
– No changes to end-system software
Control plane is the nexus of network evolution
– Changing the control plane logic can smooth transitions in network technologies and architectures
444
A Clean-slate Design
What are the fundamental causes of network problems?
How to secure the network and protect the infrastructure?
What functionality needs to be distributed – what can be centralized?
How to reduce/simplify the software in networks?
What would a “RISC” router look like?
How to leverage technology trends?
CPU and link-speed growing faster than # of switches
555
Three Principles forNetwork Control & Management
Network-level Objectives:
Express goals explicitly
Security policies, QoS, egress point selection
Do not bury goals in box-specific configuration
ManagementLogic
Reachability matrixTraffic engineering rules
666
Three Principles forNetwork Control & Management
Network-wide Views:
Design network to provide timely, accurate info
Topology, traffic, resource limitations
Give logic the inputs it needs
ManagementLogic
Reachability matrixTraffic engineering rules
Read state info
777
Three Principles forNetwork Control & Management
Direct Control:
Allow logic to directly set forwarding state
FIB entries, packet filters, queuing parameters
Logic computes desired network state, let it implement it
ManagementLogic
Reachability matrixTraffic engineering rules
Read state info
Write state
888
Overview of the 4D Architecture
Decision Plane:
All management logic implemented on centralized servers making all decisions
Decision Elements use views to compute data plane state that meets objectives, then directly writes this state to routers
Decision
Dissemination
Discovery
Data
Network-level objectives
Direct control
Network-wide views
999
Overview of the 4D Architecture
Dissemination Plane:
Provides a robust communication channel to each router – and robustness is the only goal!
May run over same links as user data, but logically separate and independently controlled
Decision
Dissemination
Discovery
Data
Network-level objectives
Direct control
Network-wide views
101010
Overview of the 4D Architecture
Discovery Plane:
Each router discovers its own resources and its local environment
E.g., the identity of its immediate neighbors
Decision
Dissemination
Discovery
Data
Network-level objectives
Direct control
Network-wide views
111111
Overview of the 4D Architecture
Data Plane:
Spatially distributed routers/switches
Can deploy with today’s technology
Looking at ways to unify forwarding paradigms across technologies
Decision
Dissemination
Discovery
Data
Network-level objectives
Direct control
Network-wide views
121212
Concerns and Challenges
Distributed Systems issues
How will communication between routers and DEs survive failures in the network?
Latency means DE’s view of network is behind reality. Will the control loop be stable?
What is the overhead to/from the DEs?
What happens in a network partition?
Networking issues
Does the 4D simplify control and management?
Can we create logic to meet multiple objectives?
131313
Fundamental Problem: Wrong Abstractions
Management Plane• Figure out what is happening in
network• Decide how to change it
Shell scripts Traffic Eng
DatabasesPlanning tools
OSPFSNMP netflow modemsConfigs
OSPFBGP
Link metrics
OSPFBGP
OSPFBGP
Control Plane• Multiple routing processes on
each router• Each router with different
configuration program• Huge number of control knobs:
metrics, ACLs, policy
FIB
FIB
FIB
Routing policies
Packet filters
Data Plane• Distributed routers• Forwarding, filtering, queueing• Based on FIB or labels
141414
Good Abstractions Reduce Complexity
All decision making logic lifted out of control plane
Eliminates duplicate logic in management plane
Dissemination plane provides robust communication to/from data plane switches
ManagementPlane
Control Plane
Data Plane
DecisionPlane
Dissemination
Data Plane
Configs
FIBs, ACLs FIBs, ACLs
151515
Fundamental Problem: Conflation of Issues
Ideal case: all routing information flooded to all routers inside network
Robustness achieved via flooding
Reality: routing information filtered and aggregated extensively
Route filtering used to implement security and resource policies
Route aggregation used to achieve scalability
161616
4D Separates Distributed Computing Issues from Networking Issues
Distributed computing issues ! protocols and network architecture Overhead
Resiliency
Scalability
Networking issues ! management logic Traffic engineering and service provisioning
Egress point selection
Reachability control (VPNs)
Precomputation of backup paths
171717
4D Can Leverage Network Structure
Decision plane logic can be specialized for structure of each physical network
Distributed protocols must be prepared for arbitrary topology graphs
4D enables network logic specialized differently for access and for backbone
E.g., creating aggregation tree in access network
Advantages
Faster route computations
Retain flexibility to evolve network as needed
Support transition to 100x100 architecture
181818
The Feasibility of the 4D Architecture
We designed and built a prototype of the 4D Architecture
4D Architecture permits many designs – prototype is a single, simple design point
Decision plane
Contains logic to simultaneously compute routes and enforce reachability matrix
Multiple Decision Elements per network, using simple election protocol to pick master
Dissemination plane
Uses source routes to direct control messages
Extremely simple, but can route around failed data links
191919
Evaluation of the 4D Prototype
Evaluated using Emulab (www.emulab.net)
Linux PCs used as routers (650 – 800MHz)
Tested on 9 enterprise network topologies (10-100 routers each)
Example network with 49 switches and 5 DEs
202020
Performance of the 4D Prototype
Trivial prototype has performance comparable to well-tuned production networks
Recovers from single link failure in < 300 ms
< 1 s response considered “excellent”
Faster forwarding reconvergence possible
Survives failure of master Decision Element
New DE takes control within 1 s
No disruption unless second fault occurs
Gracefully handles complete network partitions
Less than 1.5 s of outage
212121
Future Work
Scalability Evaluate over 1-10K switches, 10-100K routes
Networks with backbone-like propagation delays
Structuring decision logic Arbitrate among multiple, potentially competing objectives
Unify control when some logic takes longer than others
Protocol improvements Better dissemination and discovery planes
Deployment in today’s networks Data center, enterprise, campus, backbone (RCP)
222222
Future Work
Expand relationships with security
Securing the infrastructure
Using 4D as mechanism for monitoring/quarantine
Formulate models that establish bounds of 4D
Scale, latency, stability, failure models, objectives
Generate evidence to support/refute principles
232323
Themes of Network Control & Management
Holistic Design
Many different technologies – a few common problems
Find the right abstractions: exploit commonality
Clean Slate
How much autonomy do routers/switches need?
New principles for controlling networks
Separate networking issues from distributed system issues
Leverage Network Structure
Many different types of networks exist - each with different objectives and topologies
242424
Recent Publications
G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, J. Rexford, “On Static Reachability Analysis of IP Networks,” IEEE INFOCOM 2005, Orlando, FL, March 2005.
J. Rexford, A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, G. Xie, J. Zhan, H. Zhang, “Network-Wide Decision Making: Toward a Wafer-Thin Control Plane,” Proceedings of ACM HotNets-III, San Diego, CA, November 2004.
D. A. Maltz, J. Zhan, G. Xie, G. Hjalmtysson, A. Greenberg, H. Zhang, “Routing Design in Operational Networks: A Look from the Inside,” Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (ACM SIGCOMM 2004), Portland, Oregon, 2004.
D. A. Maltz, J. Zhan, G. Xie, H. Zhang, G. Hjalmtysson, A. Greenberg, J. Rexford, “Structure Preserving Anonymization of Router Configuration Data,” Proceedings of ACM/Usenix Internet Measurement Conference (IMC 2004), Sicily, Italy, 2004.
25
Questions?
262626
Fundamental Problem: Computing Configurations is Intractable
Computing configuration files that cause control plane to compute desired forwarding states is intractable NP-hard in many cases
Requires predictive model of control plane behavior
Configurations files form a program that defines a set of forwarding states Very hard to create program that permits only desired states, and
doesn’t transit through bad ones
Forwarding states allowed by configs
Auto-adaptation leads to/thru bad states
Planned responses avoid bad states
272727
4D and Today’s Networks
4D architecture and principles apply to today’s networks as well as 100x100 Enterprise/campus/university networks
Data center networks
Access/backbone networks
Greater expressivity in determining behavior Behavior of butterfly graph gadgets under failure
Selection of traffic egress points
282828
Direct Control Provides Complete Control
Zero device-specific configuration
Supports many models for “pushing” routes
Trivial push – convergence requires time for all updates to be receive and applied – same as today
Synchronized update – updates propagated, but not applied till agreed time in the future – clock skew defines convergence time
Controlled state trajectory – DE serializes updates to avoid all incorrect transient states
292929
Fundamental Problem: Wrong Abstractions
interface Ethernet0 ip address 6.2.5.14 255.255.255.128interface Serial1/0.5 point-to-point ip address 6.2.2.85 255.255.255.252 ip access-group 143 in frame-relay interface-dlci 28
router ospf 64 redistribute connected subnets redistribute bgp 64780 metric 1 subnets network 66.251.75.128 0.0.0.127 area 0router bgp 64780 redistribute ospf 64 match route-map 8aTzlvBrbaW neighbor 66.253.160.68 remote-as 12762 neighbor 66.253.160.68 distribute-list 4 in
access-list 143 deny 1.1.0.0/16access-list 143 permit anyroute-map 8aTzlvBrbaW deny 10 match ip address 4route-map 8aTzlvBrbaW permit 20 match ip address 7ip route 10.2.2.1/16 10.2.1.7
303030
Fundamental Problem: Wrong Abstractions
Router ID (sorted by file size)8810
Lines in
config file
2000
1000
0
Size of configuration files in a single enterprise network (881 routers)
313131
Fundamental Problem: Wrong Abstractions
Management Plane• Figure out what is happening in
network• Decide how to change it
Shell scripts Traffic Eng
DatabasesPlanning tools
OSPFSNMP netflow modemsConfigs
OSPFBGP
Link metrics
OSPFBGP
OSPFBGP
Control Plane• Multiple routing processes on
each router• Each router with different
configuration program• Huge number of control knobs:
metrics, ACLs, policy
FIB
FIB
FIB
Routing policies
Packet filters
Data Plane• Distributed routers• Forwarding, filtering, queueing• Based on FIB or labels
323232
Good Abstractions Reduce Complexity
All decision making logic lifted out of control plane
Eliminates duplicate logic in management plane
Dissemination plane provides robust communication to/from data plane switches
ManagementPlane
Control Plane
Data Plane
DecisionPlane
Dissemination
Data Plane
Configs
FIBs, ACLs FIBs, ACLs
333333
Fundamental Problem: Conflating Distributed Systems Issues with Networking Issues
Distributed Systems Concern: resiliency to link failures
Solution: multiple paths through routing process graph
D
D left
RoutingProcess
D left
RoutingProcess
D left
RoutingProcess
D
D
D
343434
Distributed Systems Concern: resiliency to link failures
Solution: multiple paths through routing process graph
D right
RoutingProcess
D left
RoutingProcess
D left
RoutingProcess
D
D
D
Fundamental Problem: Conflating Distributed Systems Issues with Networking Issues
353535
Fundamental Problem: Conflating Distributed Systems Issues with Networking Issues
Networking Concern: implement resource or security policy
Solution: restrict flow of routing information, filter routes, summarize/aggregate routes
D
D left
RoutingProcess
D left
RoutingProcess
D left
RoutingProcess
D
D
D
Filter routes to D
363636
4D Separates Distributed Computing Issues from Networking Issues
Distributed computing issues ! protocols and network architecture Overhead
Resiliency
Scalability
Networking issues ! management logic Traffic engineering and service provisioning
Egress point selection
Reachability control (VPNs)
Precomputation of backup paths
373737
4D Can Leverage Network Structure
Decision plane logic can be specialized for structure of each physical network
Distributed protocols must be prepared for arbitrary topology graphs
4D enables network logic specialized differently for access and for backbone
Advantages
Faster route computations
Retain flexibility to evolve network as needed
Support transition to 100x100 architecture
383838
4D Supports Network Evolution & Expansion
Decision logic can be upgraded as needed
No need for update of distributed protocols implemented in software distributed on every switch
Decision Elements can be upgraded as needed
Network expansion requires upgrades only to DEs, not every switch
393939
Three Key Questions
Is there any transition path to deploy the 4D architecture?
Is the 4D architecture feasible?
Does the 4D architecture have more expressive power than today’s approaches to network control and management?
404040
Deployment of the 4D Architecture
Pre-existing industry trend towards separating router hardware from software
IETF: FORCES, GSMP, GMPLS
SoftRouter [Lakshman, HotNets’04]
Incremental deployment path exists
Individual networks can upgrade to 4D and gain benefits
Small enterprise networks have most to gain
No changes to end-systems required
414141
Reachability Example
Two locations, each with data center & front office
All routers exchange routes over all links
R1 R2
R5
R4R3
Chicago (chi)
New York (nyc)Data Center Front Office
424242
Reachability Example
R1 R2
R5
R4R3
Chicago (chi)
New York (nyc)Data Center
chi-DC
chi-FO
nyc-DC
nyc-FO
chi-DC
chi-FO
nyc-DC
nyc-FO
Front Office
434343
Reachability Example
R1 R2
R5
R4R3
Data Center
chi-DC
chi-FO
nyc-DC
nyc-FO
chi-DC
chi-FO
nyc-DC
nyc-FO
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
Front Office
chi
nyc
444444
Reachability Example
A new short-cut link added between data centers
Intended for backup traffic between centers
R1 R2
R5
R4R3
Data Center
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
Front Office
chi
nyc
454545
Reachability Example
Oops – new link lets packets violate security policy!
Routing changed, but
Packet filters don’t update automatically
R1 R2
R5
R4R3
Data Center
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
Front Office
chi
nyc
464646
Prohibiting Packets from chi-FO to nyc-DC
474747
Reachability Example
Typical response – add more packet filters to plug the holes in security policy
R1 R2
R5
R4R3
Data Center Front Office
chi
nyc
Packet filter:Drop nyc-FO -> *Permit *
Packet filter:Drop chi-FO -> *Permit *
484848
Reachability Example
Packet filters have surprising consequences
Consider a link failure
chi-FO and nyc-FO still connected
R1 R2
R5
R4R3
Data Center
Drop nyc-FO -> *
Front Office
chi
nycDrop chi-FO -> *
494949
Reachability Example
Network has less survivability than topology suggests
chi-FO and nyc-FO still connected
But packet filter means no data can flow!
Probing the network won’t predict this problem
R1 R2
R5
R4R3
Data Center
Drop nyc-FO -> *
Front Office
chi
nycDrop chi-FO -> *
505050
Allowing Packets from chi-FO to nyc-FO
515151
525252
535353
Packet Filters Implement Policy
Packet filters used extensively throughout networks
Protect routers from attack
Implement reachability matrix
Define which hosts can communicate
Localize traffic, particularly multicast
555555
Multiple Interacting Routing Processes
OSPF BGP OSPF
FIBFIB
OSPF
FIB
OSPF
FIB
OSPF
FIB
OSPF
EB
GPPolicy1 Policy2
Internet
ClientServer
565656
The Routing Instance Graph of a 881 Router Network
575757
Reconvergence Time UnderSingle Link Failure
585858
Reconvergence Time When Master DE Crashes
595959
Reconvergence Time WhenNetwork Partitions
606060
Reconvergence Time WhenNetwork Partitions
616161
Systems of Systems
Systems are designed as components to be used in larger systems in different contexts, for different purposes, interacting with different components Example: OSPF and BGP are complex systems in its own right,
they are components in a routing system of a network, interacting with each other and packet filters, interacting with management tools …
Complex configuration to enable flexibility The glue has tremendous impact on network performance
State of art: multiple interactive distributed programs written in assembly language
Lack of intellectual framework to understand global behavior
626262
Many Implementations Possible
Multiple decision engines• Hot stand-by• Divide network & load share
Distributed decision engines• Up to one per router
Choice can be based on reliability requirements• Dessim. Plane can be in-band, or leverage OOB links
Less need for distributed solutions (harder to reason about)• More focus on network issues, less on distributed protocols
Single redundant decision engine
636363
Direct Expression Enables New Algorithms
OSPF normally calculates a single path to each destination D
OSPF allows load-balancing only for equal-cost paths to avoid loops
Using ECMP requires careful engineering of link weights
D
D
Decision Plane with network-wide view can compute multiple paths• “Backup paths” installed for free!• Bounded stretch, bounded fan-in
64
Slides under Development
656565
Supporting Network Evolution
Logic for controlling the network needs to change over time Traffic engineering rules
Interactions with other networks
Service characteristics
Upgrades to field-deployed network equipment must be avoided Very high cost
Software upgrades often require hardware upgrades (more CPU or memory)
666666
Supporting Network EvolutionToday
Today’s “Solution”
Vendors stuff their routers with software implementing all possible “features”
– Multiple routing protocols
– Multiple signaling protocols (RSVP, CR-LDP)
– Each feature controlled by parameters set at configuration time to achieve late binding
Feature-creep creates configuration nightmare
– Tremendous complexity for syntax & semantics
– Mis-interactions between features is common
Our Goal: Separate decision making logic from the field-deployed devices
676767
Supporting Network Expansion
Networks are constantly growing
New routers/switches/links added
Old equipment rarely removed
Adding a new switch can cause old equipment to become overloaded
CPU/Memory demands on each device should not scale up with network size
686868
Supporting Network ExpansionToday
Routers run a link-state routing protocol
Size of link-state database scales with # of routers
Expanding network can exceed memory limits of old routers
Today’s “Solution”
Monitor resources on all routers
Predict approach of exhaustion and then:
– Global upgrade
– Rearchitecture of routing design to add summarization, route aggregation, information hiding
Our Goal: make demands scale with hardware (e.g., # of interfaces)
696969
Supporting Remote Devices
Maintaining communication with all network devices is critical for network management
Diagnosis of problems
Monitoring status and network health
Updating configuration or software
“the chicken or the egg….”
Cannot send device configuration/management information until it can communicate
Device cannot communicate until it is correctly configured
707070
Supporting Remote DevicesToday
Today’s “Solution”
Use PSTN as management network of last resort
Connect console of remote routers to phone modem
Can’t be used for customer premise equipment (CPE): DSL/cable modems, integrated access devices (IADs)
In a converged network, PSTN is decommissioned
Our Goal: Preserve management communication to any device that is not physically partitioned, regardless of configuration state
717171
Network Control and Management Today
Data Plane
Distributed routers
Forwarding, filtering, queueing
Based on FIB or labels
Management Plane• Figure out what is
happening in network• Decide how to change it
Shell scripts Traffic Eng
DatabasesPlanning tools
OSPFSNMP netflow modemsConfigs
OSPFBGP
Link metrics
OSPFBGP
OSPFBGP
Control Plane• Multiple routing processes
on each router• Each router with different
configuration program• Huge number of control
knobs: metrics, ACLs, policy
FIB
FIB
FIB
Routing policies
Packet filters
State everywhere!
• Dynamic state in FIBs
• Configured state in settings, policies, packet filters
• Programmed state in magic constants, timers
• Many dependencies between bits of state
State updated in uncoordinated, decentralized way!
727272
Network Control and Management Today
Data Plane
Distributed routers
Forwarding, filtering, queueing
Based on FIB or labels
Management Plane• Figure out what is
happening in network• Decide how to change it
Shell scripts Traffic Eng
DatabasesPlanning tools
OSPFSNMP netflow modemsConfigs
OSPFBGP
Link metrics
OSPFBGP
OSPFBGP
Control Plane• Multiple routing processes
on each router• Each router with different
configuration program• Huge number of control
knobs: metrics, ACLs, policy
FIB
FIB
FIB
Routing policies
Packet filters
State everywhere!
• Dynamic state in FIBs
• Configured state in settings, policies, packet filters
• Programmed state in magic constants, timers
• Many dependencies between bits of state
State updated in uncoordinated, decentralized way!
Logic everywhere!
• Path Computation built i
nto routing protocols
• Routin
g Policy distributed across the routers
• Packet Filte
rs placed by tools in Mng. Plane
No way to arbitrate inconsistencies between logic!
737373
A Study of Operational Production Networks
How complicated/simple are real control planes?
What is the structure of the distributed system?
Use reverse-engineering methodology
There are few or no documents
The ones that exist are out-of-date
Anonymized configuration files for 31 active networks (>8,000 configuration files)
6 Tier-1 and Tier-2 Internet backbone networks
25 enterprise networks
Sizes between 10 and 1,200 routers
4 enterprise networks significantly larger than the backbone networks
747474
Learning from Ethernet Evolution Experience
Current Implementations:
Everything Changed Except Name and Framing
Ethernet
Conc..
Router
Server
WAN
HUB
Switch
•Switched solution
•Little use for collision domains
•Servers, routers 10 x station speed
•10/100/1000 Mbps, 10gig coming: Copper, Fiber
WAN
LAN
Ethernet or 802.3
•Bus-based Local Area Network
•Collision Domain, CSMA/CD
•Bridges and Repeaters for distance/capacity extension
•1-10Mbps: coax, twisted pair (10BaseT)
B/R
Early Implementations
757575
Ethernet: Re-inventing the Wheel
Becoming as service-rich and complex as IP
Traffic engineering
Reachability control and traffic isolation (VLANs)
QoS (802.1q)
Ethernet networks rediscovering the problems and solutions faced by IP networks
Is there commonality to exploit?
Switch/routers are all fundamentally table-driven
Destination addr, MPLS labels, VLANs, Circuit IDs
767676
Control/Management Needs of100x100 Network Architecture
Control/Management creates logical network from physical network Supports architecture and end-to-end view of 100x100
network
Access Network Logical level: aggregation tree between CPE and Regional
Node
Physical level: network with redundant links and multiple Regional Nodes
Backbone Network Logical level: full mesh of links among Regional Nodes
Physical level: sparse graph of fiber routes constrained by geography
777777
100x100 Project Themes
Clean Slate Structure Holistic Design
Control/management
Re-factoring Explicitly modeled
Network-wide abstractions
Security Fundamental primitives
?? Exploit structure to achieve efficiency
Time/space correlation, end-system infrastructure coordination
Economics Information hiding
Incentives vs. structure
??