Performance Aware SDNBay Area Network Virtualization Meetuphttp://www.meetup.com/openvswitch/
Peter PhaalInMon Corp.May 2013
Why monitor performance?
“If you can’t measure it, you can’t improve it”Lord Kelvin
Time
Capacity
Demand
Static provisioning
$ Unused capacity
$$$ Service failure
$$ Unused capacity
$$ Savings
Time
Capacity
Demand
Dynamic provisioning
Feedback control
Measure
Control
System
desired output
measured output
Controllability and Observability
Basic concept is simple, a stable feedback control system requires:1. ability to influence all important system states (controllable)2. ability to monitor all important system states (observable)
It’s hard to stay on the road if you can’t see the road, or keep to the speed limit without a speedometer
It’s hard to stay on the road or maintain speed if your brakes, engine or steering fail
Controllability and Observability driving example
Observability
Controllability
States location, speed, direction, ...
Effect of delay on stability
Measurement delay Planning delay
Time
Configuration delayDisturbance Response delay
EffectLoop delay
DDoS launched Identify target, attacker Black hole, mark, re-route? Switch CLI commands Route propagation Traffic dropped
Components of loop delay
e.g. Slow reaction time causes tired / drunk / distracted
driver to weave, very slow reaction time and they leave
the road
What is sFlow?
“In God we trust. All others bring data.”Dr. Edwards Deming
Industry standard measurement technology integrated in switcheshttp://www.sflow.org/
Open source agents for hosts, hypervisors and applications
Host sFlow project (http://host-sflow.sourceforge.net) is center of an ecosystem of related open source projects embedding
sFlow in popular operating systems and applications
Network (maintained by hardware in network devices)- MIB-2 ifTable: ifInOctets, ifInUcastPkts, ifInMulticastPkts, ifInBroadcastPkts, ifInDiscards, ifInErrors, ifUnkownProtos,
ifOutOctets, ifOutUcastPkts, ifOutMulticastPkts, ifOutBroadcastPkts, ifOutDiscards, ifOutErrors
Host (maintained by operating system kernel)- CPU: load_one, load_five, load_fifteen, proc_run, proc_total, cpu_num, cpu_speed, uptime, cpu_user, cpu_nice,
cpu_system, cpu_idle, cpu_wio, cpu_intr, cpu_sintr, interupts, contexts
- Memory: mem_total, mem_free, mem_shared, mem_buffers, mem_cached, swap_total, swap_free, page_in, page_out, swap_in, swap_out
- Disk IO: disk_total, disk_free, part_max_used, reads, bytes_read, read_time, writes, bytes_written, write_time
- Network IO: bytes_in, packets_in, errs_in, drops_in, bytes_out, packet_out, errs_out, drops_out
Application (maintained by application)- HTTP: method_option_count, method_get_count, method_head_count, method_post_count, method_put_count,
method_delete_count, method_trace_count, method_connect_count, method_other_count, status_1xx_count, status_2xx_count, status_3xx_count, status_4xx_count, status_5xx_count, status_other_count
- Memcache: cmd_set, cmd_touch, cmd_flush, get_hits, get_misses, delete_hits, delete_misses, incr_hits, incr_misses, decr_hists, decr_misses, cas_hits, cas_misses, cas_badval, auth_cmds, auth_errors, threads, con_yields, listen_disabled_num, curr_connections, rejected_connections, total_connections, connection_structures, evictions, reclaimed, curr_items, total_items, bytes_read, bytes_written, bytes, limit_maxbytes
Standard counters
Simple
- standard structures - densely packed blocks of counters
- extensible (tag, length, value)
- RFC 1832: XDR encoded (big endian, quad-aligned, binary) - simple to encode/decode
- unicast UDP transport
Minimal configuration
- collector address
- polling interval
Cloud friendly
- flat, two tier architecture: many embedded agents → central “smart” collector
- sFlow agents automatically start sending metrics on startup, automatically discovered
- eliminates complexity of maintaining polling daemons (and associated configurations)
Scaleable push protocol
• Counters tell you there is a problem, but not why.
• Counters summarize performance by dropping high cardinality attributes:
- IP addresses
- URLs
- Memcache keys
• Need to be able to efficiently disaggregate counter by attributes in order to understand root cause of performance problems.
• How do you get this data when there are millions of transactions per second?
Counters aren’t enough
Why the spike in traffic?(100Gbit link carrying 14,000,000 packets/second)
• Random sampling is lightweight
• Critical path roughly cost of maintaining one counter:if(--skip == 0) sample();
• Sampling is easy to distribute among modules, threads, processes without any synchronization
• Minimal resources required to capture attributes of sampled transactions
• Easily identify top keys, connections, clients, servers, URLs etc.
• Unbiased results with known accuracy
Break out traffic by client, server and port (graph based on samples from100Gbit link carrying 14,000,000 packets/second)
sFlow also exports random samples
Integrated data model
Packet HeaderPacket Header
Source Destination
TCP/UDP Socket TCP/UDP Socket
MAC Address MAC Address
Sampled Packet Headers
I/F Counters
Power, Temp.NETWORK
HOST
CPU
Memory
I/O
Power, Temp.
Adapter MACs
APPLICATION
Sampled Transactions
Transaction Counters
TCP/UDP Socket
Independent agents sFlow analyzer joins data for integrated view
Embedded monitoring of all switches, all servers, all applications, all the time
Virtual Servers
ApplicationsApache/PHP
Tomcat/Java
Memcached
Virtual Network
Servers
Network
Embedded monitoring of all switches, all servers, all applications, all the time
Consistent measurements shared between multiple
management tools
Comprehensive visibility
Software Defined Networking
“You can’t control what you can’t measure”Tom DeMarco
Monitor
Feedback control loop with sFlow and OpenFlow
low configuration delay
low measurement delay
Together, sFlow and OpenFlow provide the observability and controllability to enable SDN applications targeting low latency
control problems like load balancing and DDoS mitigation
low planning delay
SDN application
packets
decode hash sendflow cache flushsampleNetFlow/IPFIX
send
polli/f counters
sample
• sFlow exports packet samples immediately• sFlow also exports interface counters• NetFlow exports flow data on end of flow, active-timeout or inactive-timeout• NetFlow data generation requires significant resources on switch that can be better applied to increase size of forwarding table(s)
• OpenFlow metering has similar architecture to NetFlow and similar limitations
sFlow and NetFlow/IPFIX in a switch
InMon sFlow-RT
active timeout active timeout
NetFlow
Open vSwitch
SolarWinds Real-Time NetFlow Analyzer
• sFlow does not use flow cache, so realtime charts more accurately reflect traffic trend• NetFlow spikes caused by flow cache active-timeout for long running connections
Rapid detection of large flows
Flow cache active timeout delays large flow detection,limits value of signal for real-time control applications
Network OSA
pplic
atio
n
Open APIsA
pplic
atio
n
App
licat
ion
Data Plane
Control Plane
Configuration Forwarding Visibility
NETCONF/OF-Config
Open APIs
Hosts
sFlow adds actionable visibility to SDN stack
Actionable = complete + timely
REST API
Metrics
Flow Definitions
ThresholdsInM
on s
Flow
-RT
REST API
Ope
nFlo
w C
ontr
olle
r
Load Balancer DDoS Protection
REST Applications
Open “Southbound” APIs
Data Plane
Control Plane
Hosts
Open “Northbound” APIs
SDN Applications
SDN feedback control applications
ovs-vsctl set-controller br0 tcp:10.0.0.1:6633
ovs-vsctl — –id=@sflow create sflow agent=eth0 \target=\”10.0.0.1:6343\” sampling=1000 polling=20 \— set bridge br0 sflow=@sflow
Connect switches to central control plane
e.g connect Open vSwitch to OpenFlow controller
e.g. connect Open vSwitch to sFlow analyzer
Minimal configuration to connect switches to controllers, intelligence resides in external software
• DDoS mitigation
• Load balancing large flows
• Optimizing virtual networks
• Packet brokers
Performance aware SDN application examples
Emerging opportunity for SDN applications to leverage embedded instrumentation and control capabilities and deliver
scaleable performance management solutions
Many more use cases, particularly if you broaden the scope to the SDDC (software defined data center)
Components of a DDoS flood attack
1. Command to attack target sent over control network
2. Large number of compromised hosts start sending traffic to target
3. Traffic converges on access link, overwhelming capacity and denying access
threshold
attack starts
detected
control implemented attack eliminated
http://blog.sflow.com/2013/03/ddos.html
Before
After
Use Case 1: DDoS mitigation
pack
ets
/ sec
ond
pack
ets
/ sec
ond
sustained 6M packets/second attack(30 Gigabits/second)
http://packetpushers.net/openflow-1-0-actual-use-case-rtbh-of-ddos-traffic-while-keeping-the-target-online/Also:
ECMP/LAG multi-path traffic distribution
http://static.usenix.org/event/nsdi10/tech/full_papers/al-fares.pdf
index = hash(packet fields) % linkgroup.sizeselected_link = linkgroup[index]
Hash collisions reduce effective cross sectional bandwidth
1:1 subscription ratio doesn’t eliminate blocking, collision probabilities are high, even with large numbers of paths
Birthday Paradox
What is the chance that at least two people in a room will share a birthday?
50/50 chance with 23 people, virtual certainty with the 90 people in this room. This is a “paradox” because the probability seems remarkably high considering that there are 365 possible birthdays (366 if you include Feb 29) and 23 people represents just over 6% of the theoretical maximum and 90 people is only 25%.
http://en.wikipedia.org/wiki/Birthday_problem
ECMP/LAG/MLAG collision probabilities are surprisingly high
http://research.microsoft.com/en-us/UM/people/srikanth/data/imc09_dcTraffic.pdf
http://blog.sflow.com/2013/01/load-balancing-lagecmp-groups.htmlhttp://blog.sflow.com/2013/03/ecmp-load-balancing.htmlhttp://blog.sflow.com/2013/02/sdn-and-large-flows.html
Small number of long lived large flows responsible for bulk of load
https://datatracker.ietf.org/doc/draft-ietf-opsawg-large-flow-load-balancing/
Use SDN controller to detect and eliminate
collisions by adjusting
forwarding paths
Use Case 2: Load balancing large flows
Not just ECMP, also LAG/MLAG, Wireless and WAN etc.
Network virtualization
http://bradhedlund.com/2013/01/28/network-virtualization-a-next-generation-modular-platform-for-the-virtual-network/
Overlay network of tunnels used to carry inter-hypervisor traffic across physical network, GRE, NVGRE, VxLAN etc.
Network topology hidden behind APIs, not just Nicira/VMware, but OpenStack Quantum etc.
VM
To
VM From
FW
LB
a
a b
b c
c
d
d
Virtual network packet paths
Lack of topology awareness results in random placement of VMs
Traffic matrix on physical network appears random
Random traffic patterns
appear to need a completely flat physical network topology, i.e. non-blocking between all node pairs (fat tree, CLOS)
- expensive (cost, power, space)- limited scaleability- limited flexibility- not easily achieved in practice (large flows)
VM
To
VM From
Larg
est
tena
nt
Largest tenant
Use Case 3: Network aware VM placement
VM2 VM1VM1 VM2
SDN provides network topology and load information that allows VMs to be optimally placed
Resulting sparse, highly structured traffic matrix efficiently maps into physical resources, allows SDN controller to deliver predictable performance and workload isolation
http://blog.sflow.com/2013/04/multi-tenant-traffic-in-virtualized.html
Extension of OpenFlow to optical circuit switches allows network to be rewired for actual demand
Traffic is sparse for each tenant
Traffic within each tenant’s virtual network is similarly sparse, e.g. Hadoop above, or scale out web, cache, storage clusters
http://research.microsoft.com/en-us/UM/people/srikanth/data/imc09_dcTraffic.pdf
Use Case 4: Packet broker
ONS 2013: DEMon Software Defined Distributed Ethernet Monitoring System, Rich Groves, Microsoft
http://blog.sflow.com/2013/04/sdn-packet-broker.html
• Offloading basic traffic monitoring to sFlow takes pressure off capture network
• Visibility into traffic volumes before triggering capture
• Trigger capture based on non OpenFlow 12 tuple fields (e.g. tenant IP, VNI etc)
• Trigger on very large match lists (lists of compromised hosts etc.)
Testbed setup
wget http://www.inmon.com/products/sFlow-RT/sflow-rt.tar.gztar -xvzf sflow-rt.tar.gzcd sflow-rt
Java 1.6+
Python+ Requests library, http://docs.python-requests.org/en/latest/)
cURL
Prerequisites
Download and install sFlow-RT
10.0.0.16 10.0.0.20 10.0.0.28
XenServer Pool
Demo data from small test lab
10.0.0.30
Hyper-V
VMs: 10.0.0.1,10.0.0.59,10.0.0.114,10.0.0.121,10.0.0.150 - 10.0.0.154,10.0.0.158,10.0.0.160,10.0.0.162
Applications: HTTP, Memcached, PHP, Java
vSwitches: Open vSwitch, Hyper-V extensible vSwitch
Other sFlow sources
10.0.0.253
sFlow-RT REST API commands
/metric/10.0.0.16;10.0.0.20/max:load_one,min:load_one/json?os_name=linux,windows&cpu_num=2
scopefunction values type filter
/metric/10.0.0.253/1.ifinoctets/json
agent typevaluedatasource
Metrics
Single metric:
Metric query:
scope ALL or semicolon delimited list (unordered)
values comma delimited list (ordered) with optional prefixmax:, min:, sum:, avg:, var:, sdev:, med:, q1:, q2:, q3:, iqr: or any:
filter select metrics based on attribute values
Defining flow metrics
Keys
Value
Filter
frames bytes durationavg:bytescount:ipsource
ipsource,ipdestination,tcpsourceport,tcpdestinationport
tcpdestinationport=80,8080 & destinationgroup=internal
mask:ipsource:24
Namemetric name for results, i.e. /metric/ALL/name/json
ipsource.1 tunneled address
mask address (e.g. result = 10.1.2.0/24)
count of distinct source addressesaverage packet size
uuidsrc UUID associated with flow source
hostnamesrc ~ '.*vm.*' | sourcegroup != external
Create, Read, Update, Delete, List (CRUDL)Create (HTTP PUT/POST)
Read (HTTP GET)
Update (HTTP PUT)
Delete (HTTP DELETE)
curl -H "Content-Type:application/json" -X PUT --data "{keys:'ipsource',value:'bytes'}" \"http://localhost:8008/flow/src/json"
curl "http://localhost:8008/flow/src/json"{ "keys": "ipsource", "n": 5, "value": "bytes"}
curl -H "Content-Type:application/json" -X PUT --data "{keys:'macsource',value:'frames'}" \"http://localhost:8008/flow/src/json"
curl -X DELETE "http://localhost:8008/flow/src/json"
curl --data "name=src&keys=ipsource&value=bytes" -X POST "http://localhost:8008/flow/html"
List (HTTP GET)
curl "http://localhost:8008/flow/json"
Command examples
http://inmon.com/products/sFlow-RT/demo.sh
Use browser for exploration
Use browser for exploration
Use browser for exploration
import requests
eventurl = 'http://localhost:8008/events/json?maxEvents=10&timeout=60'eventID = -1while 1 == 1: r = requests.get(eventurl + "&eventID=" + str(eventID)) if r.status_code != 200: break events = r.json() if len(events) == 0: continue
eventID = events[0]["eventID"] events.reverse() for e in events: print str(e['eventID']) + ',' + str(e['timestamp']) + ',' + e['thresholdID'] + ',' + e['metric'] + ',' + str(e['threshold']) + ',' + str(e['value']) + ',' + e['agent'] + ',' + e['dataSource']
Tail events using HTTP “long” pollingextras/tail_log.py
Define flow keysDDoS Protectiondefine address groups
define flowsdefine thresholds
while(running) {receive threshold event
monitor flow deploy control
monitor flowrelease control
}
OpenFlowControllerREST API
sFlow-RTREST API
1
2
3
4
65
8
7
REST operation flow chart
Large flow detection script (initialization)
import requestsimport json
rt = 'http://localhost:8008'groups = {'external':['0.0.0.0/0'],'internal':['10.0.0.0/8']}flows = { 'keys':'ipsource,ipdestination', 'value':'frames', 'filter':'sourcegroup=external&destinationgroup=internal'}threshold = {'metric':'ddos','value':400}r = requests.put(rt + '/group/json',data=json.dumps(groups))r = requests.put(rt + '/flow/ddos/json',data=json.dumps(flows))r = requests.put(rt + '/threshold/ddos/json',data=json.dumps(threshold))...
extras/ddos_log.py
Large flow detection script (monitor events)
...eventurl = rt + '/events/json?maxEvents=10&timeout=60'eventID = -1while 1 == 1: r = requests.get(eventurl + "&eventID=" + str(eventID)) if r.status_code != 200: break events = r.json() if len(events) == 0: continue eventID = events[0]["eventID"] events.reverse() for e in events: thresholdID = e['thresholdID'] if "ddos" == thresholdID: r = requests.get(rt + '/metric/' + e['agent'] + '/' + e['dataSource'] + '.' + e['metric'] + '/json') metrics = r.json() if len(metrics) > 0: evtMetric = metrics[0] evtKeys = evtMetric.get('topKeys',None) if(evtKeys and len(evtKeys) > 0): topKey = evtKeys[0] key = topKey.get('key', None) value = topKey.get('value',None) print e['metric'] + "," + e['agent'] + ',' + key + ',' + str(value)
Next Steps
Build your own test bed:1. sFlow-RT is already installed on your laptop, capable of monitoring thousands of switches (remember to turn off demo.pcap and enable UDP port 6343 on your firewall)
2. Enable sFlow in your network (OVS, Hyper-V, physical switches, http://sflow.org/products/network.php)
3. Install Host sFlow agents http://host-sflow.sourceforge.net/ + application agents: Apache, NGINX, Apache, HAProxy etc. http://host-sflow.sourceforge.net/relatedlinks.php
Engage with the broader sFlow community:
https://lists.sourceforge.net/lists/listinfo/host-sflow-discuss
http://groups.google.com/group/sflow
4. You don’t have to have access to a physical test lab, build a Mininet / Open vSwitch virtual test lab, e.g. http://blog.pythonicneteng.com/2013/05/pytapdemon-part-3-pro-active-monitoring.html
http://groups.google.com/group/sflow-rt
Find out more about sFlow:http://sflow.org/
http://blog.sflow.com/
Questions?