WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Endsystem Support for Network Virtualization
Fred Kuhns
2WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Overview• Network Diversification:
– Virtual network (vNet): distinct vNets coexist within a common physical network– Diversification layer: common substrate to share physical resources and provide isolation– vNet is composed of one or more virtual routers (VR) interconnected by virtual links. Virtual routers
and links are direct corollaries to their physical counterparts … Network resources are virtualized.– An end-system implements vNet protocols and provides connectivity services within a virtualized
network protocol environment (virtual end-system). The virtual end-system provides mechanisms for protocol implementation, resource control and isolation.
• Diversification layer provides two levels of abstraction (i.e. two core services):– Substrate: encapsulate existing layer 1 and layer 2 technologies and provide a single, consistent
framework for implementing virtualized links and routers.substrate link: abstraction to provide similar behavior as a point-to-point connection between communicating end points. Provides isolation services to different virtual networks using a common substrate link.substrate router: A physical device which forwards network traffic based on its vNet membership. Provides sharing and isolation services to disparate vNets and hosts virtual routers.
– Virtual: framework providing a simple model and set of interfaces for implementing virtual networks. The model defines virtual routers, end-systems and links. The goal is for virtual inks to and routers to behave similar to their physical counterparts.virtual link: simulates the behavior of a dedicted point-to-point link interconnecting virtual end points (virtual routers and/or virtual end systems). A virtual link is implemented by one or more substrate links. virtual router: implements a particular vNet’s routing logic. The underlying substrate router provides the necessary isolation and resource management functions.
3WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
vNet Discussion• Develop examples/scenarios
– intranet (no routing) use existing model
– internet (routing) use diversified networking model
– use Ethernet and virtualized IP as running example
• Model: Simple– network devices interconnected through simplex, point-to-point links.
– common link layer protocol used for delivering packetized data to neighbor (not end-to-end but hop to hop)
• Achieving this model– context: shared heterogeneous physical network, links and packet
switches (aka packet routers)
– objectives: • partition physical resources into virtual links and routers
• isolation mechanisms for virtualized resources
• bind virtualized resources to network instances
4WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Context: Network Diversification (vNets)
substrate router
virtual router
substrate link
virtual link
virtual end-system
5WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Simulates Star Topology for Substrate Links
…
switched LAN
VLANX1
Internetworking over a diversified networkSubstrate function with Ethernet: • Substrate links: use VLANs to provide the equivalent
of a virtualized “wire” connecting an endsystem to a specific substrate router.
• Sharing and Isolation: - All vNet traffic use assigned VLANs- Use priority queuing (802.1P/Q)- All intranet traffic uses lower priority queues.
• Resource management:- LAN: Use admission control (static or dynamic) to
provide bandwidth guarantees to vNet traffic.- End system: Substrate layer on end-system enforce
per VLAN and per vNet bandwidth constraints• Virtual links: In this simple example there is exactly
one virtual link for each substrate link.
• Each host to substrate router connection is assigned a distinct VLAN. So N hosts implies N VLANs on Ethernet.
• Alternative is to define one VLAN tree for each protocol suite (i.e. vnet).
VLANX2 VLANXN
vNetX
VR1
6WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
vNetX
VR1
Traffic isolation with priority aware substrate
Ethernet Hubwith High and LowPriority TX queues
vNet traffic to Highotherwise Low
…
HighLow
HighLow
HighLow HighLow
vNet traffic (internet)
Local traffic (intranet)
Local control/management;Legacy internet traffic
all vNet traffic
7WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Substrate Link as a VLAN Tree
…
VLANX
ethernet switched LAN
Internetworking over a diversified networkSubstrate function with Ethernet: • Substrate links: The VLAN creates a tree
interconnecting all end-systems to the substrate router. Substrate end-point then uses the VLAN tag and source/destination address to realize the logical point-to-point substrate link.
• Sharing and Isolation: - no change from substrate star topology. The only
difference is the shared VLAN domain. Scheme provides traffic isolation.
• Resource management:- Same
• Virtual links: Same.
8WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Multiple Substrate Links
VLANdgram
VLANmed
VLANhigh
ethernet switched LAN
…
Internetworking over a diversified networkSubstrate function with Ethernet: • Substrate links: Three VLAN trees are used for all
virtual net traffic to/from a substrate router: - Low priority: default for best-effort traffic- Medium priority for virtual nets with soft
performance requirements (average bandwidth)- High priority for isochronous or low-delay,
interactive applications• Sharing and Isolation: See above.• Resource management: See above• Virtual links: Same.
9WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Multiple vNets per Host
…
VLAN1 VLAN2 VLAN3
VLI VLI VLI
The full model:• Substrate link: connects end-system to substrate router.
Virtualization of a physical cable or wire. A packet enters one end, exists the other and is opaque within.
- Simplex or Duplex?• Substrate interface: end-system abstraction
- Ethernet: <interface, VLAN, dst_addr>- tunnel: MPLS, IP, IPsec, L2TPv3, GRE, AToM- Layer 2: ATM, others?
• Virtual link: Logical interconnection (virtual wire) of adjacent vNet nodes.
- Point-to-point, Simplex or Duplex?• Virtual interface: end-system abstraction representing
one end of a virtual link. Substrate defines mechanism for multiplexing onto common substrate link. For example a virtual link identifier (VLI) in a substrate header
- Simplex or Duplex?
VLAN tag and dst addridentify substraterouter. VLI tagused to router pkt
ether addr/vlan
ether addr/vlan
ether addr/vlan
ethernet LAN
substrate interface
virtual interface
substrate interfaces
virtual interface
VR1
VLIVLI
VR1
10WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Multiple next hop VRs
VLANA1
vNetX
VR1
vNetX
VR2
vNetX
VR3
VLANA2 VLANA3
Host Amember of
vNetX and vNetY
ethernet switched LAN
Multiple Next Hop Virtual Routers:• Substrate link: per end-system, substrate router pair.• Substrate interface: three substrate interfaces:
SI1 = <eth0, VLANXA1, enetAddrSR1>SI2 = <eth0, VLANXA2, enetAddrSR2>SI3 = <eth0, VLANXA3, enetAddrSR3>
• Virtual link: Logical point-to-point connection between virtual end-system and access virtual router. Since we model a point-to-point link there is no need for link addresses.
• Virtual interface: Representation of virtual link on the end-system. The substrate assigns a per substrate link, virtual link identifier (VLI) for each virtual link.
VI1 = <SI1, VLI1>VI2 = <SI1, VLI2>VI3 = <SI2, VLI1>VI4 = <SI3, VLI1>
enetAddrSR1
enetAddrSR2 enetAddrSR3enetAddrA
substraterouter 1
substraterouter 2
substraterouter 3
vNetY
VR1
VLI1 VLI2
VLI1 VLI1
11WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Substrate Interface:Directly connected: destination IP address + ARP = enet addrGateway: (Gateway’s IP + ARP = enet addr) + VLAN
Virtual Interface:Directly connected: Not used, model only for internetworkingGateway: VLI assigned by substrate. How is this integrated into the current ARP/route interface?
VLI VLI
IP
TCP/IP as an Example Protocol
…
destination
prefix
gateway
(router address)
virtual interface
substrate
interfacell_info
192.168.12.0/24 0.0.0.0 eth0 ARP
*
(default)192.168.12.254
vint0
(eth0,VLAN,ethDst)VLI
vNet Protocl = IP
eth0standard ethernet
Interface
ethernet device
VLANX
direct connect
ethernet LAN
VLAN
VLI
Substrate RouterSR1
ethernetdest. addr
vint0
VLANX
eth0
vNetframework
12WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Using Tunnels for the substrate layer• Need to look into the various tunneling approaches/protocols.
How can we leverage these?– MPLS and MPLS VPNs
– Generic Routing Encapsulation (GRE): RFC 2784
– Point-to-point tunneling protocol (PPTP)
– Secure VPN
– Any transport over MPLS (AToM)
– IP tunnel
– IPsec VPNs
– Layer 2 Tunneling Protocol version 3 (L2TPv3)• version3 is a draft standard
• RFC 2661: Layer 2 tunneling protocol
– 802.1Q Tunneling: Cisco 802.1Q-in-Q VLAN Extension Services
• What about MPLS over IP tunnels: what was done there?
13WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
OS Kernel Block Diagram
configuration: registers, MMU (TLB, cache, VM) bus and peripheralsSystem Exception handlers
ethernet
Socket Interface
UDP RAW IP
IP routes
TCPnTCP2TCP1 …TCP module
clock handlerprocess accountingschedulingtime management
uarteth0
timer
hardware dependent layer
HW interrupt/Exception
hardware independent layer
scheduler
SW int(AST)callout Q
TCPpoll
tasks
task management
openfiles
FS managementbuffercache
ops
File Interface ops
Device independent I/O
Inte
rru
pt P
roce
ssin
gA
ST
Pro
cess
ing
User Space (Applications)
Hardware
Basic I/O Interface
txqueue rxqueue
TC/AST
qdisc
device driver
OS ISR demux
callback
util
14WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
User or kernel Space protocols?• Each has pros and cons
• User space protocols:– easier to implement and debug
– easier to introduce new protocols (not tightly dependent on socket layer knowing about the new protocol)
– easier to isolate and protect protocols and apps from each other (leverage process model)
• kernel level protocols– easier to integrate into existing framework (simplifies support for system
interface functions like select/poll)
– simplifies intra-protocol security and protection (since protocol runs within trusted kernel)
– simplifies (well, more direct) kernel demultiplexing to correct protocol context (endpoint)
– increased efficiency
15WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
User Space Protocol Implementation• Uncommon outside of high-performance community, they want
zero-copy and specialized demux keys.• Problems: asynchronous processing, life cycle, authentication and
demultiplexing to endpoints– latency in delivering packets (i.e. acks) to user space– increased overhead in per packet processing before a drop/keep decision is
made– processing received acks– timeouts and retransmissions– establishing connections and security: snooping, masquerading– supporting select and poll– protocols where connection may outlive process (TCP’s TIMED_WAIT)– global routing and address resolution tables– global connection tables
• need to know what other ports are being used (locally)• accepting/rejecting new connections
16WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Assumptions
• Assumptions:– Applications using different VNs (or no VN) will need
to communicate using the various IPC mechanisms
– We want to manage all aspects of Network I/O but not the use of other traditional resources (memory, files etc)
– CPU, memory and interface bandwidth controlled at the virtual net granularity
– intra-VN, implementers should have the mechanisms to support QoS and Security
– simple mechanism for adding new protocols/VNs
17WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
User Space Protocols
Chandramohan A. Thekkath , Thu D. Nguyen , Evelyn Moy , Edward D. Lazowska, Implementing network protocols at user level, IEEE/ACM Transactions on Networking (TON), v.1 n.5, p.554-565, Oct. 1993
Chris Maeda, Brian Bershad, Protocol Service Decomposition for High-Performance Networking, Proceedings of the 14th ACM Symposium on Operating Systems Principles. December 1993, pp. 244-255.
• Aled Edwards , Steve Muir, Experiences implementing a high performance TCP in user-space, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, p.196-205, 1995
• Kieran Mansley, Engineering a User-Level TCP for the CLAN Network, Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications, Pages: 228 – 236, 2003
18WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
user-space protocols: Global Issues• Routing: Direct packets to/from correct endpoint/interface
– How is traffic demultiplexed and sent to the correct endpoint/process?• In-kernel filters
– Where are the routing tables and how are they maintained?• route fixed when connection established or located in shared memory
• Control: I use IPv4 as an example– Address resolution protocols/tables? – Other control protocols. For example ICMP, IGRP, others?– Where are the routing protocols implemented?
• Management:– Must manage a protocols namespace (for example, port numbers in IPv4).– Common programming technique, allow protocol instance to select local address part
• specify port = 0 and addr = 0 then implementation will assign correct values– Passive connect model?
• In IPv4 a server listens on a port (host:port:proto) for a connection request. To establish a connection a unique (to the endsystem) port number is assigned and new socket allocated.
– socket-oriented system calls must be supported. On UNIX must support non-blocking I/O with select and poll.
– Connection lifetime may outlast process.• For example TCP TIME_WAIT or simply waiting for a final ack or resending if no ack received.
• Security: we must provide sufficient mechanisms for protocol developers– implementations must be able to guard against masquerading and eavesdropping
19WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
User Space: Configurations
• Given these global issues there are two likely configurations:– all traffic passes through common protocol daemon in user
space– control daemon implements basic set of control functions while
user library implements majority of data path functions– prior work has shown the latter approach to be superior.
• Having all traffic pass through a common protocol daemon => at least one extra copy operation (kernel -> daemon -> user process)
• A better solution is for a daemon to insert relatively simple packet filters in kernel for established connections which directs packets to/filters packets from endpoints.
20WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
socket layer
connection filters
User-Space: Passive Open
vnetXcontrol daemon:
(namespace, lifecycle, connections)
vnetX: protocol library
application
ethernet
vnet demux
3. insert incoming andoutgoing filters forvnetX connection
1. connectionrequest (in)
4. new connection
0. listen/accept(passive open)
5. data, establishedconnections
compare against connection specific outgoing filter
use VLI to access incoming filters and use to demux to filter set and/or socket.
data copy
2. ack (out)
21WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
User-Space: Active Open
socket layer
connection filters
vnetXcontrol daemon:
(namespace, lifecycle, connections)
vnetX: protocol library
application
ethernet
vnet demux
3. insert incoming andoutgoing filters forvnetX connection
4. new connection
0. connect
5. data, establishedconnections
compare against connection specific outgoing filter
data copy
1. connectionrequest (out)
2. ack (in)
use VLI to access incoming filters and use to demux to filter set and/or socket.
22WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
socket layer
connection filters
User-Space: Datagram (Connectionless)
vnetX: protocol library
application
ethernet
vnet demux
1. insert incoming andoutgoing filters forvnetX connection
2. new connection(local address)
0. open(any)
3. data establishedconnections
compare against “connection” specific outgoing filter
use VLI to access incoming filters and use to demux to socket. In this case only the local part is used.
data copy
daemon fills in local address and binds to socket. No restrictions on destination
vnetXcontrol daemon:
(namespace, lifecycle, connections)
23WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
socket layer
connection filters
User-Space: Datagram (Connectionless)
vnetX: protocol library
application
ethernet
vnet demux
1. insert incoming andoutgoing filters forvnetX connection
2. new connection(local and remote)
0. open(local and remote addr)
3. data establishedconnections
compare against “connection” specific outgoing filter
use VLI to access incoming filters and use to demux to socket.
data copy
daemon fills in both local and destination addresses. Destination restricted
vnetXcontrol daemon:
(namespace, lifecycle, connections)
24WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
socket layer
connection filters
User-Space: App exits
vnetXcontrol daemon:
(namespace, lifecycle, connections)
vnetX: protocol library
application
ethernet
vnet demux
3. remove filters 1. connectionclose (out)
drop
2. ack (in/out)
TCP enters TIME_WAIT after close
25WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Extensible protocol frameworks in the kernel
• Herbert Bos, Bart Samwel, Safe Kernel Programming in the OKE, Proceedings of the fifth IEEE Conference on Open Architectures and Network Programming, June 2002
26WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
OKE• Context: For performance reasons it is useful to permit third parties to load optimized
modules into the kernel• Problem: Third party code is untrusted so loading into kernel will compromise system
security and reliability. Could use safe execution environment like java but incurs expensive runtime checks.
• Solution: create set of mechanisms and policies to permit non-root users to safely load untrusted application modules into kernel space with minimal impact on runtime performance.
– Safety: use a trusted compile to enforce policies (constraints). The constraints are designed to ensure the untrusted module will not adversely affect the kernel (core and loadable modules) or unrelated processes.
– User privileges: Vary enforced constraints based on user privileges (customizable language)– Termination: well defined termination boundaries to protect system state– Enforcement: Static and dynamic checks; language extensions– Ease of use: Familiar development environment using Cyclone (type safe, C extension) and
kernel module.• Contribution: definition of safe kernel programming environment that meets competing
needs:– performance– safety– ease of use– hosted in a commodity OS
27WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Considerations
• Identified areas where modules may impact system behavior1. program correctness: language restrictions for safety and
enforce coding conventions
2. Memory access: static and dynamic enforcement of memory access rules
3. Kernel module access: static and dynamic enforcement of kernel module (interface) access restrictions
4. Resource usage: Bounded (deterministic or limited)
28WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Pushing protocols into the Kernel
• Positives:– All the issues associated with user-space protocol simply go
away. Global tables and lifetime of the kernel
– Performance, efficiency, existing code base
– Enhances intra-Protocol security
– Simplifies integration with existing network I/O subsystems and interfaces
• Negatives: – Isolation: More difficult to isolate system from protocol
instances. Inter-protocol isolation difficult.
– Security: Proving trust/security more difficult
– Implementation and debugging more difficult in kernel
29WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 04/20/23
Kernel-Space Protocols
…
ethetnet
TCPnTCP2TCP1 …UDP RAW IP
IP routes
TCP
eth device driver
HW interrupt/Exception
HW Interrupt
SW Interrupt
User Space (Applications)
Hardware
openfiles
FS managementbuffercache
opsFile Interface
I/O Interface
vnet Demux
VLAN
Application(s)
vnet Socket I/O Interfacevnet ops
vnet Protostate tables
/dev/protoX/dev/vnet
udp:porttcp:port rawIP…vnet:epvnet:ep
Socket InterfacePF_VNET PF_INET
eth0
route to interface
TCP/IPvnet Protostate tables …
Rework!