+ All Categories
Home > Internet > FPGA based 10G Performance Tester for HW OpenFlow Switch

FPGA based 10G Performance Tester for HW OpenFlow Switch

Date post: 12-Jan-2017
Category:
Upload: yutaka-yasuda
View: 79 times
Download: 2 times
Share this document with a friend
16
FPGA based 10G Performance Tester for HW OpenFlow Switch Yutaka Yasuda, Kyoto Sangyo University
Transcript

FPGA based 10G Performance Tester for HW OpenFlow Switch

Yutaka Yasuda, Kyoto Sangyo University

Why (data plane) Performance Test needs for HW OpenFlow switch?

• There are some “Conformance Test” activities

• RYU Certification

• ONF PlugFest

• How about “Performance Test” ?

• Lack of it, you may fall into the pitfall.

• “It works, but too slow”

Typical Story : Here is a Flow Entry on the OpenFlow HW Switch…

• 2 possibilities to handle it, by Hardware (ASIC) or Software (CPU).

• It is the same functionally, but 1000 times difference in latency. ( μsec vs msec )

• It is not always documented. (basically, no reason to confess it for vendors)

• Features reply is not enough.

• May be depends on the version of the firmware and NOS of the switch.

• No easy & straight way to know it.

• Imagine, what happen when you update your firmware, NOS or OF App…..

Real Example? Here is.

OpenFlow Controller

Pica8 3290

Spirent

port#1 port#2

Dev. 2 Dev. 3

port#3

#1#2 #3

1. Spirent sends 64B length packets. 2. Pica8 has a flow entry to forward it from #2 to #3. 3. Spirent checks the latency.

Pica8 + Spirent experiment

In Simple and Basic configuration

• Just forwarding here to there (see below)

• Succeed to forward in wire speed. (1Gbps)

• Latency : Avg. 4.26, Min 4.13, Max 4.28 (usec)

cookie=0x0, duration=1379.649s, table=0, n_packets=0, n_bytes=0, idle_age=1379, in_port=1,dl_src=00:10:94:00:00:05 actions=output:2

Example of the flow entry:

looks fine!

Good! and Boom! results

• Good results

• MAC rewrite : no additional latency, no degradation of throughput.

• ToS rewrite : same as above

• Bad and Unexpected result

• IP rewrite : deadly slow. Avg. 140ms, Min 0.8ms, Max 350ms (boom!)

• over 1000 times slow throughput

cookie=0x0, duration=3.402s, table=0, n_packets=0, n_bytes=0, idle_age=3, ip,in_port=1,nw_src=192.85.1.5 actions=mod_nw_dst:192.85.1.16,output:2

Example of the flow entry:

Features Reply?

• It looks only VLAN, MAC treatment are available.

• In fact….

• ToS modification runs on the hardware.

• IP modification will fall back to the software.

• You never know if you never have a go.

root@PicOS-OVS#ovs-ofctl show br0OFPT_FEATURES_REPLY (xid=0x2): dpid:0000000000000111n_tables:254, n_buffers:256capabilities: FLOW_STATS TABLE_STATS PORT_STATS STP ARP_MATCH_IPactions: OUTPUT SET_VLAN_VID SET_VLAN_PCP SET_DL_SRC SET_DL_DST ENQUEUE ………

You can test by yourself : several options

• Buy Ixia or Spirent : very accurate but super expensive, just overkill

• PC + 10G NIC + Software : cheap but inaccurate

• not easy to tune and calibrate enough. yes you can, but not for everyone.

• FPGA + 10G I/F : not super-cheap but accuracy guaranteed

• time-stamped by hardware, in clock cycle. (8ns currently)

• all time-sensitive components run independently with PC as mothership.

• easy setup. just put the board and run controller app.

My project : FPGA based solution

Xilinx Kintex-7, 125MHz 10G (SFP+) x4 Hardware TCP/UDP implement PCIe gen2 x1 (just for control)

enough external memory

4x10G ports no need to use SAS this time

test scenario................

test scenario................

Host PC

Target Switch

FPGA + 10G I/Fs

monitor controller

RYU+ custom App

set packet pattern to FPGA

Operator's Browser

test scenario................

HTTP POST

result

oputput

includes : packet generate pattern + flow entries configuration

REST API

10G Ethernet

OpenFlow 1.x protocol

System Console(JavaScript App)

load

OF Controller

System Structurepacket generation/send/receive/counting will be

done in FPGA board

detail data

send packets & observe latency

Experiment #1 : 10G/1G stable forwarding measurement

IP DST mod

Match pattern Action

In-port X

Figure 1. 2. shows "ASIC" powered result. Every switch has different distributions, but all done in sub-micro seconds. Switch A did around 2.7μ in very steep. C has 9μ or around cause it is 1G switch.

020406080100120140160180200

2728

2736

2744

2752

2760

2768

2776

2784

2792

2800

2808

2816

2824

2832

2840

2848

2856

2864

2872

2880

pack

ets�

latency (ns)�

Figure 1. Switch A (10G) latency distribution

0"

20"

40"

60"

80"

100"

120"

8448"

8576"

8704"

8832"

8960"

9088"

9216"

9344"

9472"

9600"

9728"

9856"

9984"

packets(

latency((ns)(

Figure 2. Switch B (1G) latency distribution.

(as a proof of the accuracy)

Experiment #2 : Unexpected show forwarding (software fallback)

IP DST mod

Match pattern Action

IP SRC

Only add an IP SRC matching added, the Switch did "software fallback". (Fig 3) Around 350-500μ. But still 2.7% packets exist on the outside of the graph, far right. The slowest one over 10ms. And this case, 1000 times slower forwarding.

0

20

40

60

80

100

120

362496

372736

382976

393216

403456

413696

423936

434176

444416

454656

464896

475136

485376

495616

505856

516096

526336

536576

546816

557056

567296

577536

587776

598016

608256

618496

628736

638976

649216

659456

669696

679936

690176

700416

710656

720896

731136

741376

751616

761856

772096

782336

792576

pack

ets�

latency (ns)�

In this case, the maximum throughput is only 16Kpps. In 100Byte length packet, it means 12.8Mbps.

Figure 3. Switch B (1G) latency distribution, in software fallback situation

continue to right more...

In this case, the maximum throughput is only 16Kpps. As 100Byte length packets, it means 12.8Mbps.

Experiment #3 : When it will go slow?

In switch B case;IP matching and IP mod are able to handle by ASIC separately. But if you specify them at once, it will be slow. BUT IP matching and ToS mod are able to specify both at once!

Totally unexpected.... (sigh)

Use Case #1 Hunt the “killer entry” - unexpected slow processing order you may have

• OF Apps set the flow entries as their needs, but they don’t care about the performance.

• When your service has performance degradation, you need to make sure that “no killer entry” exists.

OF switch

flow entries

OF switch

flow entries

OF switch

flow entries

OF switch

flow entries

Your OpenFlow Network

flow entries

testbed switch

packet pattern

packet generator

observe latency

Performance Tester

send packets

set

visualize

collect(w counter info)

Use Case #2Comparison “before & after” about the update of SW driver or NOS

• Need to check the performance degradation BEFORE you apply the update to REAL network.

• For the future, need to see what happen if the flow entries and traffic will go double.

OF switch

flow entries

OF switch

flow entries

OF switch

flow entries

OF switch

flow entries

Your OpenFlow Network

flow entries X

flow entries Y

collect

before the update

after the updateflow entries

testbed switch

packet pattern

packet generator

observe latency

Performance Tester

send packets

setresult X

result Y

test & record

compare

Watch the “Killer Entry”. To protect yourself from unexpected performance plunge,

monitor your switches healthiness on your site.


Recommended