1
Designing Campus Networks for Manageability
Nick FeamsterGeorgia Tech
2
Campus Networks: Large and Complex
• 3 campuses: Atlanta; Savannah; Metz, France– 87,000 ports, 155 buildings, 1,800 fiber miles– 2,300 Wireless APs (3,900 radios)
• Campus-wide Layer 2 Mobility
• Transit for the southeast– Internet 2, National Lambda Rail (NLR), Southern
Light Rail (SLR), Southern Crossroads (SoX)
3
Problems in Campus Networks
• Security: Access control is complex, dynamic– Example: Campus wireless network has 4-6k active
clients at any time, considerable churn– Resonance: Dynamic access control enterprise networks
• Virtual Network Configuration– Today: Virtual LANs– Today: Effects of VLAN-induced sharing– Next steps
4
Dynamic Access Control in Enterprise Networks
Ankur Nayak, Hyojoon Kim, Nick Feamster, Russ Clark
Georgia Tech
5
Motivation
• Enterprise and campus networks are dynamic– Hosts continually coming and leaving
– Hosts may become infected
• Today, access control is static, and poorly integrated with the network layer itself
• Resonance: Dynamic access control– Track state of each host on the network
– Update forwarding state of switches per host as these states change
6
State of the Art• Today’s networks have many components
“bolted on” after the fact– Firewalls, VLANs, Web authentication portal,
vulnerability scanner
• Separate (and competing) devices for performing the following functions– Registration (based on MAC addresses)– Scanning– Filtering and rate limiting traffic
Correctness depends on state that is distributed across the network.Many chances for incorrect, or unsynchronized state!
7
Example: User Authentication
3. VLAN with Private IP
6. VLAN with Public IP
1. New MAC Addr 2. VQP
7. REBOOT
Web Portal
4. Web Authentication
5. Authentication Result
VMPSNew Host
8
Problems with Current Architecture
• Access control is too coarse-grained– Static, inflexible and prone to misconfigurations– Need to rely on VLANs to isolate infected machines
• Cannot dynamically remap hosts to different portions of the network– Needs a DHCP request which for a windows user would
mean a reboot
• Correctness depends on consistent state
• Monitoring is not continuous
9
Resonance: Main Ideas
• Idea #1: Access control should incorporate dynamic information about hosts– Actions should depend not only on changing state of
host, but also on the security class of that host
• Idea #2: Distributed dependencies should be minimized.– Incorrect behavior often results from unsynchronized
state. Build a consistent view of the network.
10
Resonance: Approach
• Step 1: Specify policy: a state machine for moving hosts from one state to the other.
• Step 2: Associate each host with states and security classes.
• Step 3: Adjust forwarding state in switches based on the current state of each machine– Actions from other network elements, and distributed
inference, can affect network state.
11
Example: Simple User Authentication
Registration
ScanningOperation
Quarantined
SuccessfulAuthentication
Vulnerability detected
Not Infected
Failed Authentication
Infection Removed
Still Infected afte
r an update
13
Resonance: Step by Step
Internet
3. Scanning
1. DHCP request
4. To the Internet
2. Authentication
Controller
DHCP ServerWeb Portal
14
Implementation: OpenFlow/NOX• OpenFlow: Flow-based control over the
forwarding behavior of switches and routers– A switch, a centralized controller and end-hosts– Switches communicate with the controller through an
open protocol over a secure channel
15
Why OpenFlow?
• Much complexity, bugs, and unexpected behavior results from distributed state
• Solution: Centralize control and state– Specifying a single, centralized security policY– Coordinating the mechanisms for switches– Finer granularity of control– Separation of control from data plane
16
Preliminary Deployment
17
Immediate Challenges• Scale
– How many forwarding entries per switch?– How much traffic at the controller?
• Performance– Responsiveness: How long to set up a flow?
• Security– MAC address spoofing– Securing the controller (and control framework)
18
Length of Each Flow
Much DNS traffic is quite short.Not tenable to have flow table entry for each.
19
Overhead: Flow Setup and Transfer
• RTT delays < 20 ms• Secure copy overhead < 1%
20
Summary
• A new architecture for dynamic access control– Preliminary design– Application to Georgia Tech campus network– Preliminary evaluation
• Many challenges remain– Scaling– Performance– Policy languages– Complexity metrics
21
Big Challenge: Policy Languages
• Holy grail: Elevate policy configuration from mechanism to a high-level language
• Our belief: Centralized control can simplify this
• Various programming language-related projects (e.g., Nettle) may hold some promise– Perhaps also formally checkable– Maybe ConfigAssure could help?
22
Understanding VLAN-Induced Sharing in a Campus Network
Mukarram bin Tariq, Ahmed Mansy,Nick Feamster, Mostafa Ammar
Georgia Tech
23
Virtual LANs (VLANs)• Multiple LANs on top of a
single physical network
• Typically map to IP subnets
• Flexible design of IP subnets– Administrative ease
– Sharing infrastructure among separate networks, e.g., for departments, experiments
• Sharing: IP networks may depend on same Ethernet infrastructure
VLAN2
VLAN3
VLAN1
VLAN Core
Ethernet
24
Informal Operator Survey
“[users] can end up on portsconfigured for the wrong VLAN …. difficult for end users todetermine why their network isn't working ("but I have a link light!”)”
“I wish for insight. Better visibility into operational details”
“Using only the information the switch can give [is difficult to determine] to which VLAN or VLANs are the busy ones”
“deploy tomography tool [for the campus to isolate faulty switches]”
Need for diagnostic tools for VLANs
Shared failure modes among networks
Lack of cross-layer visibility
25
Key Questions and ContributionsHow to obtain visibility in sharing of Ethernet among IP networks?
How much sharing is there in a typical network?
Analysis of VLAN in Georgia Tech network
1358 Switches, 1542 VLANs Find significant sharing
How much does Ethernet visibility help?
Network tomography 2x improvement in binary tomography
using Ethernet visibility
EtherTrace: A tool for discovery of Ethernet devices on IP path
Passive discovery using bridge tables Does not require CDP or LLDP
26
EtherTrace: IP to Ethernet Paths Due to spanning tree, frames from
H1 and H2 are received on separate ports of same VLAN for switches that are on the path
C
B D
E
FA
H1
H2F
E
Frames arrive on separate ports for on-path switches
Frames arrive on same port for off-path switches
A
B
C
D EtherTrace automates discovery of Ethernet path by analyzing bridge and ARP tables, and iterating for each IP hop in IP traceroute
Works well for stable networks
Available at: http://www.gtnoise.net/ethertrace
27
Georgia Tech: Network Dataset
Data sources
• 1358 Switches• 31 Routers• 79 monitoring
nodes
Dataset
• Bridge tables obtained every 4 hours• ARP tables obtained every hour• IP traceroutes among monitoring
nodes every 5 minutes• One-day snapshot on March 25, 2008
Analysis
• Obtain Ethernet devices for IP traceroutes using EtherTrace• Quantify the sharing of Ethernet devices among IP hops
and paths
28
Ethernet Hops Shared Among IP Hops
57% of Ethernet Hops have 2 disjoint IP Hops in common
Maximum IP hops traversing an Ethernet hop: 34. 17 when considering only disjoint IP hops.
29
Application Network Tomography
• Send end-to-end probes through the network• Monitor paths for differences in reachability• Infer location of reachability problem from these differences
Monitor
x
y
Targets
30
Improving Diagnosis Accuracy
MetricUsing IP level
information onlyIncorporating layer-2
visibility
Accuracy: Is failed hop in the diagnosed set of hops?
Fraction of times faulty edge in diagnosed set
54% 100%
Specificity: How big is the diagnosed set relative to number of failed hops?
Size of Diagnosed Set
Average 3.7 1.48
95th %-ile9 1
Experiment1. Simulate failure of a random Ethernet hop2. Determine IP paths that are affected by the failure3. Use binary tomography to determine the hop that
has fault
31
Summary• Surprising amount of sharing
– On average, an Ethernet hop affects ~30 IP hops– 57% of Ethernet hops affect two or more disjoint IP
hops
• Failure of an Ethernet device affects (on average) as many IP paths as failure of an IP device– Two orders of magnitude more Ethernet devices
• Cross-layer visibility improves diagnosis– 2x improvement in accuracy and specificity
• EtherTrace: www.gtnoise.net/ethertrace
32
Next Steps
• Better understand the types of tasks that VLANs are used for in the campus– Ease of mobility– Separation of access for security reasons
• Flattening the layers: Explore the possibility of eliminating VLANs where they are not needed?– E.g., can an OpenFlow-capable network eliminate the
need for VLANs?
33
Problems in Campus Networks
• Security: Access control is complex, dynamic– Example: Campus wireless network has 4-6k active
clients at any time, considerable churn– Resonance: Dynamic access control enterprise networks– Next steps: Scaling, policy languages
• Virtual Network Configuration– Today: Virtual LANs– Today: Effects of VLAN-induced sharing– Next steps: VLAN alternatives