+ All Categories
Home > Documents > The ATLAS Switch Tester An application of the GETB platform Matei Ciobotaru CERN and...

The ATLAS Switch Tester An application of the GETB platform Matei Ciobotaru CERN and...

Date post: 13-Dec-2015
Category:
Upload: garey-goodwin
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
The ATLAS Switch Tester An application of the GETB platform Matei Ciobotaru CERN and “Politehnica” University of Bucharest
Transcript

The ATLAS Switch Tester

An application of the GETB platform

Matei CiobotaruCERN and “Politehnica” University of Bucharest

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 2

Outline Motivation GETB architecture

Hardware and firmware Control software

Features Descriptor-based traffic generation Client-Server traffic generation (Atlas-like)

Measurements Methodology Sample results Conclusions

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 3

Motivation Atlas TDAQ DataFlow network

Large high-speed Layer 2 network (~700 nodes) Central switches: ~250 ports, chassis based Concentrating switches: 24 - 48 ports, pizza box Packet loss or excessive latency performance drop

Specific performance requirements for DataFlow switches Need to evaluate devices with realistic test scenarios See the “Switch Features” document

Testing equipment on the market is not fully adequate Cost per channel is high Not enough flexibility in defining traffic patterns Layer 4 - 7 functionality – not essential for Atlas

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 4

GETB Platform – Hardware

GPS Clock

2 x Gigabit Ethernet

3.3V PCI

Configuration Flash

SDRAM 2 x 64Mb

SRAM 2 x 512Kb

Altera Stratix EP1S25 FPGA

FPGA Firmware 90% Handel-C, 10% VHDL

Commercial IP cores (IP = Intellectual Property)

Gigabit Ethernet MAC PCI Controller

Logic utilization – approx 85 - 90% Single FPGA controls 2 Eth ports

Multiple projects using the GETB GETB Tester, Network Emulator, ROS

Emulator Common firmware / control infrastructure

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 5

GETB Platform – Control

Client (runs on any platform) Talks to servers using XML-RPC Runs user-defined scripts Displays statistics in a GUI Manages cards from multiple servers

GETB Client(Control PC)

Device Under Test(DUT)

GETB Servers

GPS Clock Distribution

Gigabit Ethernet Links

Server (runs on Linux) Configures the cards Monitors status / collects statistics Handles remote client requests Accesses cards using IO-RCC

(DataFlow package)

Distributed System 15 servers hosting 65 cards Entirely based on Python

DUT Control

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 6

GETB Platform – Photo

GETB Servers

Device Under Test

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 7

Traffic Generation / Reception Packet descriptors created offline

Any traffic distribution is supported: Poisson, random, bursts, etc.

Multiple packet types Raw Ethernet, IPv4, IPv6

Special packets: VLAN, Flow-Control

Works at Gigabit line-speed for all packet types and sizes

Each packet contains: Sequence number

Timestamp

Statistics at receiver Global counters

Bytes, frames, packet types Counters per remote source

Packet loss Average latency Average IPG

Histograms Packets can be classified

according to source or VLAN ID Latency, IPG and packet size can

be histogrammed User defined resolution and

histogram window (start offset, length)

Transmission (descriptor-based) Reception

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 8

Atlas-like Traffic (Client-Server) Tester ports are divided into clients and servers

Client ports send requests to server ports, asking for data

Servers send replies with single or multiple frames (bursts) Number of replies is configurable

Data rate is self tuned by limiting the number of client outstanding requests OutReq Each client monitors the number of outstanding

requests Request sent: OutReq = OutReq + 1 Reply received: OutReq = OutReq - 1 Two watermarks are defined OutReq > High watermark

Stop sending requests OutReq < Low watermark

Resume sending requests Recovery mechanisms are implemented on

both ends to deal with packet loss

C

S SS

Client

Low Watermark High Watermark

Requests

Replies

Servers

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 9

Emulation of DataFlow applications The Client-Server mode emulates the traffic

between fundamental Atlas applications L2PU, SFI, ROS

Traffic type Node type FunctionLow

watermarkHigh

watermarkNumber of

replies

RoI CollectionClient L2PU 0 10 -

Server ROS - - 1 or 2

Event Building

Bus based

Client SFI 9 10 -

Server ROS - - 12

Event Building

Switch based

Client SFI 25 30 -

Server ROS - - 1

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 10

Measurements Methodology

User defines the test in Python Script, pre-defined function

Multiple iterations are executed over a given space of parameters Log files and test data are saved automatically DUT is configured and its internal statistics are collected Consistency checks are done

Results are analyzed Tables and graphs are generated

Collect statisticsIntegrity checks

Test Description(Python)

Run Test Analysis Report

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 11

Results for device A Full mesh traffic

Each iteration with a different packet size

All tests running at 100% line speed

Fine level of granularity Result reveals some “features”

Packets of certain sizes are not forwarded at line-speed

The “standard” RFC packet sizes have zero loss

Currently discussing with the manufacturer to better understand the results

0 200 400 600 800 1000 1200 1400 16000

0.2

0.4

0.6

0.8

1

tota

l lo

ss

ra

te [

%]

Frame size [byte]

100% line-speed Oload

1% loss – small, but non-zero

Packet sizes used in standard RFC benchmarks have zero loss

(64b, 128b, 512b, etc.)

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 12

88 90 92 94 96 98 10086

88

90

92

94

96

98

100

Intended Load (% line speed)

Rec

eive

d L

oad

(%

lin

e sp

eed

)

Full Mesh - Throughput

512 bytes1027 bytes1518 bytes

Results for device B – Full Mesh Device B

120 ports 3 modules of 40 ports

Tested with fully-meshed and client server traffic

Fully-meshed traffic DUT seems to perform

very well under heavy load

This type of test can be executed using any testing equipment

99.9 % line speed

DUTDUTFull-Mesh Client-Server

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 13

Results for device B – Client Server

The DUT is not as good as we initially thought

The cabling of nodes on ports has a strong impact on device performance In real Atlas we have constraints for media type

Optical fiber for servers (ROSs) Copper for clients (SFI, L2PU)

Random node/port distribution not possible in practice

Working with the manufacturer to solve the issue

80 85 90 95 100 105 110 115 12084

86

88

90

92

94

96

98

100

Offered Load (% line speed)

Re

ce

ive

d L

oa

d (

% l

ine

sp

ee

d)

Client-Server - Throughput

LinearBalancedRandom

80 85 90 95 100 105 110 115 1200

5

10

15

20

25

30

Offered Load (% line speed)L

os

s R

ati

o

Client-Server - Loss vs Offered Load

LinearBalancedRandom

Device starts losing at 88% load at the receivers (clients)

Client / switch-port distribution

Linear Balanced Random

Clie

nts

Servers

Device B

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 14

Credits / Applications

Hardware design Brian Martin

Firmware and control software Matei Ciobotaru

Contributions Stefan Stancu Micheal LeVine Markus Joos

Applications currently using the GETB platform : GETB Tester Atlas ROS Emulator LAN / WAN Network

Emulator

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 15

Conclusions The GETB platform provides a flexible

environment for the design and development of Gigabit Ethernet applications

A tester able to evaluate switches for the DataFlow network has been created

128 ports running at line-speed

Client-server traffic emulation provides a way to test devices under realistic Atlas conditions

Evaluating the device in a worst-case scenario

We have a comprehensive set of test procedures to check that devices meet our requirements

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 16

The End

End of slide show.

Backup slides

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 17

Atlas-like Traffic – details

Watermarks Low = 2 High = 6

= Request

= Reply

Servers

Client

1 2 3

4 5 6

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 18

Python Language

Features Object oriented, Easy to learn, read, use Extremely portable Extensible (new modules)

class Stack:

"A well-known data structure"

def __init__(self): # constructor

self.items = []

def push(self, x):

self.items.append(x)

def pop(self):

x = self.items[-1]

del self.items[-1]

return x

def empty(self):

return len(self.items) == 0

What is it used for? rapid prototyping scientific applications extension language web programming

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 19

GETB Firmware Architecture

TODO – block diagram

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 20

Handel-C Hardware description language

Like VHDL, but with syntax similar to C The result of the compilation is the

description of an electrical circuit

Contains built-in parallel constructs

Special features Arbitrary widths on variables Enhanced bit manipulation operators

Simple timing model Each assignment is one clock cycle

Support for hardware constructs Multiple clock domains, on-chip

memories, external interfaces

Synchronization primitives: channels, semaphores

// 3 Clock Cycles {

a=1;b=2;c=3;

}

Sequential Block

Parallel Block

// 1 Clock Cycle par{

a=1;b=2;c=3;

}

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 21

Measurements Methodology (variant)

User defines the test in Python Script, pre-defined function

Multiple iterations are executed over a given space of parameters Log files and test data are saved

automatically DUT is configured and its internal

statistics are collected Consistency checks are done

Results are analyzed Tables and graphs are generated

GE

TB

Se

rve

rG

ET

B S

erv

er

DUT

SN

MP

/ T

elne

tS

tatis

tics

XMLRPC

Control / Results Control

Collect statisticsIntegrity checks

Test Description(Python)

Run Test Analysis Report

9-Nov-2004 Matei Ciobotaru - TDAQ Workshop - Frascati 22

Results for device A (variant) Full mesh traffic

Each iteration with a different packet size

All tests running at 100% line speed

0 200 400 600 800 1000 1200 1400 16000

0.2

0.4

0.6

0.8

1

tota

l lo

ss

ra

te [

%]

Frame size [byte]

100% line-speed Oload

Result reveals some limitations Device uses a paged

memory management Memory bandwidth is

insufficient Certain packet sizes

more affected than othersUseful packet data Wasted memory

space

Small packet

Big packet


Recommended