How to Build a 100 GbpsDDoS Traffic Generator
DIY with a Single Commodity-off-the-shelf Server (COTS)
Surasak [email protected]
DISCLAIMER
THE FOLLOWING CONTENTS HAS BEEN APPROVED FORAPPROPIATE AUDIENCES
THE PRESENTATION HAS BEEN RATED RESTRICTED
NON TECHNICAL REQUIRES ACCOMPANYING OR MENTOR
การดำเนินการใดจากตัวอย่างการบรรยายนี้ ต้องทำโดยไม่รบกวนระบบคอมพิวเตอร์อื่นเพื่อไม่ให้กระทบกับมาตรา ๑๐ ซึ่งอาจมีความผิดตามมาตรา ๑๒ ตามที่กำหนดโดย
พรบ. ว่าด้วยการกระทำความผิดเดี่ยวกับคอมพิวเตอร์ (ฉบับที่ ๒) พ.ศ. ๒๕๖๐
USE AT YOUR OWN RISK
Why DDoS traffic generator?
R&D Tool for :Network behavior Traffic Log, Traffic Analysis, Anti-DDoS
Testing network middle boxesIDS, IPS, Firewall, Router
Synthetic traffic but closed to realistic traffic
HW V.S. SW generator
Items Dedicated Hardware Server with SoftwarePrecision High ModerateLatency Low Moderate
Capability Full max rate Near max rateCost High Economical
Goal:100 Gb/s DDoS traffic generator
Constraints:A single COTS serverA single 100 GigE NIC (not 10x10 GigE)
Outline
IntroductionDDoS understandingEthernet revisiting
HW and SW solution for 100 Gb/s generatorServer and componentsLinux Networking StackOpen source SW generator
Testbed and Performance results
INTRO
TESTBED
HWSW
Broadly types of DDoS
Volume Based AttacksTo saturate the bandwidth of the attacked siteMeasured in bits per second (bps)
Application Layer AttacksLow-and-slow attacks to crash targetsMeasured in requests per second (rps)
Protocol AttacksTo consumes actual target resources, or intermediate communication equipment (firewalls, load balancers, etc)Measured in packets per second (pps)
Evolution of Ethernet
• Capacity and speed requirements on data links keep increasing
• Big Data, AI require more bandwidth
• Servers have begun to be capable of sustaining 100G to memory
10 Mb/s 100 Mb/s
1 Gb/s
10 Gb/s
40,100 Gb/s
IEEE Std 802.3bs200, 400 Gb/s
25 Gb/s
40,000Xin34yrs
1983 1995 1998 2002 20172010 2015
Understanding Ethernet Wire speed
Wire Speed refers to the hypothetical peak packet bitrate
What is the maximum packet per second (pps)that can be generated for a specific Ethernet speed?Q:
Minimum Frame Size (Large number of frame per unit time)
S S S S S S S1
The Frame sizes matter
Two options for consideration:
1 second
Maximum Frame Size (Small number of frame per unit time)
L L L
1 second
2
IFG
Ethernet frame by frame delivery7 1 6 6 2 from 46 to1500 4 12 (bytes)
FCS PA SA Type SFD Payload DA PA SFD
Fields Size (bytes)Preamble+SFD 8Dst Address 6Src Address 6Type 2Payload 46FCS 4IFG 12Total 84
Fields Size (bytes)Preamble+SFD 8Dst Address 6Src Address 6Type 2Payload 1,500FCS 4IFG 12Total 1,538
Mini Size (64 bytes) Max Size (1518 bytes) S L
Max @100 GigE
• Maximum frame rate for 64 byte packets over 100 GigE link
M = Speed/Size = 100x109 / 672= 148,809,523 pps
Maximum throughputT = M*64*8
= 76.19 Gbps
• Maximum frame rate for 1518 byte packets over 100 GigE linkM = Speed/Size = 100x109 / 12,304
= 8,127,438 pps
Maximum throughputT = M*1518*8
= 98.69 Gbps
100 GigE performance
Rate (Gb/s)
#Frame(Min:64B)
#Frame(Max:1518B)
1 1.48 M 81 K10 14.88 M 812 K40 59.52 M 3.25 M
100 148.81 M 8.12 M
Max Frame Rate @different speed
Challenge for Packet Processing
Incoming Packet 1T1
T2
Lookup in Packet 1
Do Packet 1
Incoming Packet 2
Lookup in Packet 2
Do Packet 2
InterPacketArrivalTime
Rate (Gb/s)
#Frame(Million)
Inter Packet Arriving Time (ns)
1 1.48 67210 14.88 67.240 59.52 16.8100 148.81 6.72
#Frame and Timing with 64 byte length
Time/CPU budget in 100 Gbps• With 148.81 Mpps, the time budget for
processing a single packet is:
1/(148.81x106) = 6.72 nanosecond
• Considering a server with 3 GHz CPU…..• How many clock cycle does it require to
handle minimum frame size of 100 Gb/s packet rate?
6.72x10-9*3x109 ~ 20 clock cycles
To Delivery 100 GigE
100 GbE
100 GbE
CPU
1
Interconnection2
PCI Bus3Memory Bus 4
4 Crucial components:
Hardware Capability
PCIe 3.0upto 40lanes/sockets
(252Gb/sforx16)
4ChannelsDDR42133MZHupto
546Gb/s
QPI156Gb/s
1 2
3
4
OS’s obstacle
• Traditional OS network stacks is problematic
• Not design with this speed in mind
• Many features essential for networking
• filtering, connection tracking, memory management, VLANs, overlay, and process isolation
• Not scalable even many CPU cores these days
http://www.makelinux.net/kernel_map/
Overhead in Linux kernel
• Socket based system calls
• Context switching and blocking I/O
• Data Copying from kernel to userspace
• Interrupts Handling
• High latency
LinuxNetworkStackWalkthrough(2.4.20)
https://wiki.openwrt.org/doc/networking/praxis
Conventional Stack V.S. Kernel bypass
• Let’s bypass kernel and work directly with NICs
• Allows access to the hardware directly from applications
• Using a set of libraries for fast packet processing
• Reduces latency with more packets to be processed
• Handles packets within minimum number of CPU cycles
• But…• Provides only very basic set of
functions (memory management, ring buffers, poll-mode drivers)
• Require reimplementation of others IP stack features
Conventional (Sockets based)
Application
Hardware
Kernel
User
Sockets
Network Driver
TCP/IP Stack
Hardware
Kernel
User
Application
Kernel Bypass (RDMA based)
TCP/IP Stack
Network Driver
Packets Library
Zero Copying (ZC) with RDMA Conventional (Sockets based) Kernel Bypass (RDMA based)
Application
Hardware
Kernel
User
Sockets
Network Driver
TCP/IP Stack
App buffer
Sockets buffer
Device buffer
Data copy
Data copy
Data copy
Application
Hardware
Kernel
User
Packet Libraries
Network Driver
TCP/IP Stack
Shared buffer
ZCwithRemoteDirectMemoryAccess
Core 0
Scalable with multicores
Kernel
Core 1
Application
Packet Libraries
Core 2
Application
Packet Libraries
Core 3
Application
Packet Libraries
Tx0 Rx0 Tx1 Rx1NIC
Fast (Userspace) Packet Processing
• Kernel bypass also known as• Fast Packet Processing• High-Performance Packet IO• Data Plane Processing Acceleration Framework
DPDK Netmap PF Ring
OS Linux, FreeBSD FreeBSD,Linux Linux
License BSD BSD LGPL + paid
Language C C C
Use Case Appliances, NFV NFV, Router Packet Capture, IDS/IPS
NIC vendors Several Intel Intel
Supports Community Community Company
DPDK
• Data Plane Development Kit• A set of libraries and drivers for fast
packet processing
• Main Libraries• multicore framework• huge page memory• ring buffers• poll-mode drivers
Currently managed as an open-source project under the Linux Foundation
http://dpdk.org/
DPDK based Open Source Projects
SPDKStorage Performance
Development Kit
Packet-journeyLinux router
pktgen-dpdk
Virtual Multilayer Switchintegrated into various cloud platform
Carrier-grade, integrated, open source platform to accelerate Network Function Virtualization (NFV)
IO services framework for the network and storage software with Vector Packet Processing
Linux scalable software routers, proved with 500k routes
libraries for writing high performance, scalable, user-mode storage applications
The Stateful Traffic Generator for L1-L7
Flexible Stateless/StatefulTraffic Generator for L4-L7
Software based traffic generator
What can be built with DPDK?
• Switch/Router • Stateless and stateful
Firewall • IDS/IPS• Load balancer • Traffic recorder
• Fast internet scanners• Stateless packet generator • Stateful, application-like flow
generator • IPsecVPN gateway • Accelerated key-value DB • Accelerated NAS
TRex
• DPDK based stateful/stateless traffic generator (L4-L7)
• Replay of real traffic (pcap), scalable to 10K parallel streams
• Supports about 10-30 mpps per core, scalable with the number of cores
• Scale to 200 Gb/s for one COTS
High scale benchmarks for stateful networking gear (Firewall/NAT/DPI)
Generating high scale DDOS attacks
High scale, flexible testing for switches
Scale tests for huge numbers of clients/servers
https://trex-tgn.cisco.com/
TRex sample Traffic config file
• 255 clients talking to 255 servers
root: ~/trex-core/scripts# cat cap2/dns.yaml- duration : 1.0generator :
distribution : "seq"clients_start : "16.0.0.1"clients_end : "16.0.0.255"servers_start : "48.0.0.1"servers_end : "48.0.0.255"clients_per_gb : 201min_clients : 101dual_port_mask : "1.0.0.0"tcp_aging : 0udp_aging : 0
cap_info :- name: cap2/dns.pcapcps : 10.0ipg : 10000rtt : 10000w : 1
40 Gb/s Traffic Generator Reports
• Pktgen-dpdk• https://www.chelsio.com/wp-content/uploads/resources/T5-40Gb-
Linux-DPDK.pdf• TRex
• https://trex-tgn.cisco.com/trex/doc/trex_stateless_bench.html• Warp17
• https://github.com/Juniper/warp17
Where is the 100 Gb/s results?
Testbed
• HW: Dell R430• 2xIntel Xeon E5-2640 v4 2.40 GHz
dual socket, 10-core• 64 GB RAM (4x16 GB DDR4 2400
MHz)• 1.5 TB NL-SCSI • DPDK based 100Gbps NICs
• SW• CentOS 7.3 Kernel 3.10• DPDK 17.05.2• TRex 2.29
Dell R430 Dell R430
100 GigESender Receiver
Ongoing R&D Project
• Porting Traffic Recorder• HTTP Log and Flow Log• Current testbed 30 Gb/s capability (4x10 Gb/s)
• ~60,000 flow/s• ~10 Million active flows
• Support both IPv4 and IPv6
• Development of Stateless DDoS mitigation
• Development of Traffic base IoT devices auto discovery and analysis
Summary
• COTS Server is capable for 100 GigE
• Data plane solution is a future for COTS based appliance
• Rising trend of SW based network appliances for high speed network