+ All Categories
Home > Documents > Linux TCP/IP Tuning

Linux TCP/IP Tuning

Date post: 12-Nov-2014
Category:
Upload: qexing
View: 8,929 times
Download: 2 times
Share this document with a friend
Popular Tags:
71
Copyright 2004 OSDL, All rights reserved. Analyzing TCP Performance Sr. Staff Engineer Linux Kongress 2004 2004-09-09 Stephen Hemminger
Transcript
Page 1: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved.

Analyzing TCP Performance

Sr. Staff EngineerLinux Kongress 2004

2004-09-09

Stephen Hemminger

Page 2: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 2 -

Agenda

■ Introduction■ TCP for muggles■ Engineering Process■ Problem examples■ Network Tools■ Wrapup

Page 3: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 3 -

Outside of scope

■ Non TCP protocols■ SCTP, multicast, etc

■ Queuing theory - “no math”■ Hardware and product comparisons

Page 4: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 4 -

My Background

■ Did TCP back in the “old school”■ BSD 4.2, Ethernet■ SMP Unix versions of OSI, Netware, Appletalk, ...■ Plan9 Hypercube communication

■ Linux■ Incorporation of TCP research in 2.6 kernel■ Performance tests for LWE■ Wizard gap

Page 5: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 5 -

Limits of my knowledge

■ Only worked with current Linux (2.4/2.6)■ Will mention tools here that I have not used

extensively■ Involved in development of Linux not deployment

or research

Page 6: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 6 -

Agenda

■ Introduction■ TCP for muggles■ Engineering Process■ Problem examples■ Network Tools■ Wrapup

Page 7: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 7 -

TCP for “muggles”

■ connection establishment■ slow start■ windows■ congestion control■ silly window

Page 8: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 8 -

Connection establishment

SYN

SYN+ACK

Data 1(10)

Ack 11

connect

Client Server

write

accept

read

Page 9: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 9 -

ethereal

Page 10: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 10 -

tcpdump trace

13:28:21.745624 IP 172.20.1.60.38052 > 216.239.39.99.http: S 1765497548:1765497548(0)win 5840 <mss 1460,sackOK,timestamp 1563951453 0,nop,wscale 7>

13:28:21.831935 IP 216.239.39.99.http > 172.20.1.60.38052: S 227058185:227058185(0)ack 1765497549 win 8190 <mss 1460>

13:28:21.832035 IP 172.20.1.60.38052 > 216.239.39.99.http: . ack 1 win 584013:28:21.832321 IP 172.20.1.60.38052 > 216.239.39.99.http: P 1:126(125) ack 1 win 584013:28:21.939237 IP 216.239.39.99.http > 172.20.1.60.38052: . ack 126 win 3146013:28:21.972448 IP 216.239.39.99.http > 172.20.1.60.38052: P 1:485(484) ack 126 win 3146013:28:21.972529 IP 172.20.1.60.38052 > 216.239.39.99.http: . ack 485 win 643213:28:21.973016 IP 172.20.1.60.38052 > 216.239.39.99.http: F 126:126(0) ack 485 win 6432

Page 11: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 11 -

Flow control

Data 1011 (1400)

ACK 1010 (5000)

Ack 6010 (1000)

write

read (1000)

Data 3811 (1400)

Data 2411 (1400)Data 5211 (800)

Ack 6010 (0)

Page 12: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 12 -

Retransmission

Data 1

Ack 1Ack 1

write

Data 2

Multiple ack's = fast retransmit

Page 13: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 13 -

Tcptrace

http://tcptrace.org

Tool to convert captured data into graphs■ Time sequence graph■ Throughput■ RTT

Lots more than time to cover here!

Page 14: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 14 -

Xplot

http://xplot.org■ Takes plot command scripts■ Mouse

■ Zoom – drag with the left button■ Zoom out – click the left button ■ Scroll – drag with middle button■ Dump – shift-left button produces postscript

■ Shift-middle and shift-right also

Page 15: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 15 -

Time Sequence Graph

Page 16: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 16 -

Page 17: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 17 -

Windows & Buffering

■ Used to isolate TCP from application read/write■ Used for congestion control■ Upper bound determined by system parameters

Page 18: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 18 -

Congestion window

■ slow start■ Window normally starts small■ Grows in response to ack

■ congestion control■ Packet loss = congestion

Page 19: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 19 -

Silly Window

Data (2000)

Ack [10]write8k bytes

Ack [2000]

Read8k bytes

“Hey, I am not going to try and send this data now give me a bigger window first”

OK, thanks

Page 20: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 20 -

Model of TCP networks

Network

Send Window

Sender

Receive Window

Receiver

Data

Ack

BDP = Bandwidth (bytes/sec) * Delay (secs/unit)

Page 21: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 21 -

BDP - Bandwidth Delay Product

■ BDP = amount of data in transit■ Examples

■ DSL/Cable modem (international)

1,000,000 bit/sec * 1/8 byte/bit * 500 ms = 62500 bytes

■ Gigabit across US

1,000,000,000 bit/sec * 1/8 byte/bit * 70 ms = 8,75 Mbytes

Page 22: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 22 -

0.1 1 10 100 10000.1

1

10

100

1000

Delay (ms)

Ban

dwid

thM

bits

/sec

Bandwidth Delay Product (BDP)

8K1M64K

Broadband

ResearchLAN

Page 23: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 23 -

Internet

■ Router queues■ Delays

■ Speed of light (70ms coast/coast)■ Slow routers

■ Packet correlation, sizes■ DoS

Page 24: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 24 -

Extensions for larger windows

■ TCP Selective Acknowlegement (SACK) RFC2018

■ Don't have to retransmit everything

■ Window scaling (RFC1323)■ Window size multiplied by 2n

■ Protection Against Wrapped Sequence (PAWS)■ Timestamp inside each packet

Page 25: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 25 -

TCP options negotiation 1

IP 172.20.1.60.32820 > 216.239.39.99.http: S 3599527174:3599527174(0) win 5840<mss 1460,sackOK,timestamp 2519711 0,nop,wscale 2>

IP 216.239.39.99.http > 172.20.1.60.32820: S 3820474812:3820474812(0) ack 3599527175 win 8190 <mss 1460>IP 172.20.1.60.32820 > 216.239.39.99.http: . ack 1 win 5840IP 172.20.1.60.32820 > 216.239.39.99.http: P 1:126(125) ack 1 win 5840

Window scale by 4

But server doesn't support scaling

Page 26: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 26 -

TCP options negotiation 2

IP 172.20.1.60.32823 > 65.172.181.13.http: S 4120108902:4120108902(0) win 5840 <mss 1460,sackOK,timestamp 3036627 0,nop,wscale 2>

IP 65.172.181.13.http > 172.20.1.60.32823: S 2295773021:2295773021(0) ack 4120108903 win 5792

<mss 1460,sackOK,timestamp 1818411318 3036627,nop,wscale 0>IP 172.20.1.60.32823 > 65.172.181.13.http: . ack 1 win 1460 <nop,nop,timestamp 3036628 1818411318>IP 172.20.1.60.32823 > 65.172.181.13.http: P 1:144(143) ack 1 win 1460 <nop,nop,timestamp 3036628 1818411318>

Window scale by 4

Your scaling is okay, but don't scale mine

Page 27: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 27 -

Linux TCP window tuning

■ Send window - net.ipv4.tcp_wmem■ three values : initial default max

■ default is 4K 16K 128K■ also limited by net.core.wmem_max

■ Receive window – net.ipv4.tcp_rmem

■ three values : initial default max■ default is 4K 85K 170K

■ also limited by net.core.rmem_max

Page 28: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 28 -

Linux TCP window tuning

■ Overall memory – net.ipv4.tcp_mem■ three values : low pressure max■ automatic value based on system memory

■ Application window – net.ipv4.tcp_app_mem

■ reserved space to handle slow applications

Page 29: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 29 -

But!

■ Some firewalls and routers are buggy■ Corrupt window scale change N to 0■ Forget to track state, or read RFC wrong■ Connections will hang because initial window looks

like a silly window■ 1% of the net is buggy..

■ Linux 2.6.9 chooses window scale based on maximum possible receive window

■ Default tcp_rmem => window scale of 2■ Buggy devices will see ¼ of the real window

Page 30: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 30 -

Break

Page 31: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 31 -

Agenda

■ Introduction■ TCP for muggles■ Engineering Process■ Problem examples■ Network Tools■ Wrapup

Page 32: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 32 -

Performance Engineering process

■ Define what your goal■ Capture information■ Analyze and form hypothesis■ Prototype to validate hypothesis

■ If successful■ Make changes on production system■ Report problems or patches to others

Page 33: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 33 -

Goal setting

■ Know what is possible:■ bus bandwidth, network latency, etc.

■ Know your application■ Compare with similar applications

Page 34: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 34 -

TCP performance testing

■ Goal: Improve TCP performance over high bandwidth * delay links

■ Plan:■ New TCP congestion control■ Validate and test

Page 35: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 35 -

Testing TCP over WAN

■ Want to test performance of TCP over high BDP links

■ Can't afford a 10Gbit trans-continental link■ Proposal: emulate network delay over 1Gbit

Ethernet

Page 36: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 36 -

Existing network emulation tools

■ Dummynet

http://info.iet.unipi.it/~luigi/ip_dummynet/I don't want to setup separate FreeBSD machine

■ NISTnethttp://snad.ncsl.nist.gov/itg/nistnet/

Only on 2.4 and not ready to be in main tree

Page 37: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 37 -

Netem

http://developer.osdl.org/shemminger/netem■ Started out as simple delay only hack■ Grown up to do all the functionality of NISTnet

Ethernet (eth0)

netem

IP

TCP

Page 38: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 38 -

Current TCP research

■ Alternative TCP congestion■ Vegas■ Westwood■ Binary Increase Congestion Control (BIC)

■ Research community based around Web100

Page 39: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 39 -

TCP Reno

■ Standard default in 2.4/2.6■ Adjusts congestion window based on packet loss■ Slow start – window grows slowly■ Additive Increase window on each Ack■ Multiplicative Decrease on loss

Page 40: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 40 -

TCP Vegas

■ Original work by Larry Peterson■ Patches existed for 2.2, 2.4 and part of web100■ sysctl net.ipv4.tcp_cong_avoid

■ Measure bandwidth based on RTT■ Adjust congestion window on bandwidth■ Avoids packet loss

Page 41: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 41 -

TCP Westwood

■ Work by Caludio Casetti■ Patches for 2.4 by Angelo Dell'Aera■ sysctl net.ipv4.tcp_westwood

■ Focused on wireless ■ packet loss != congestion

■ Measure bandwidth based on RTT■ Use normal Reno till congestion then adjust

congestion window based on bandwidth

Page 42: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 42 -

Binary Increase Congestion Control (BIC)

■ Work by Lisung Xu■ Patches for Web100 (2.4)■ sysctl net.ipv4.tcp_bic

■ Designed for best high speed networks■ Modification of Reno■ Use additive increase when congestion window

is large■ Binary search increase when window is small

Page 43: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 43 -

Tuning

■ Default tcp parameters not big enough ■ Need bigger send and receive window

■ Send window autosized based on rtt already■ Receive window autosizing was done in Web100

Page 44: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 44 -

Receiver Tuning

■ Patches from John Heffner■ sysctl net.ipv4.tcp_moderate_rcvbuf

■ Dynamic Right Sizing (DRS)■ adjust receive window based on RTT■ If application doesn't set window then do it for them■ Window will grow from default to max

Page 45: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 45 -

Receiver auto-tuning

0 50 100 150 2000

200

400

600

800

1000

Default

Auto Tuned

Delay (ms)

Thr

ough

put (

Mbi

ts/s

ec)

Page 46: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 46 -

Throughput vs Delay (initial run)

0

100

200

300

400

500

600

700

800

0 50 100 150 200

Ba

nd

wid

th (

Mb

its/s

ec)

Delay (ms)

RenoVegas

WestwoodBic

Page 47: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 47 -

What's happening

■ NAPI■ Driver API to allow avoiding interrupts■ Trades off latency for overall performance

■ E1000 driver■ Uses NAPI for transmit

Answer: Transmit ring gets full and driver flow blocks

Solution: set TxDescriptors=1000

Page 48: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 48 -

Thorughput vs Delay (rerun)

0 25 50 75 100 125 150 175 2000

100

200

300

400

500

600

700

800

Reno

Vegas

Westwood

BIC

Delay (ms)

Thr

oug

hput

(bi

ts/s

ec)

Page 49: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 49 -

Performance still slow

■ Vegas and Westwood are terrible■ Not at full link speed■ Performance falling off with delay

Page 50: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 50 -

Vegas trace with 100ms delay

Page 51: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 51 -

Vegas detail

Page 52: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 52 -

Westwood (70ms)

Page 53: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 53 -

Westwood detail

Page 54: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 54 -

BIC trace (100ms)

Page 55: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 55 -

BIC detail (100ms)

Page 56: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 56 -

How to squeeze out more performance

■ Large MTU (4k) + 63%■ LAN driver not-module up to 10%■ Turn off timestamps + 4%■ Bind IRQ to processor varies

Page 57: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 57 -

Congestion more work

■ Vegas doesn't use available window■ Does it under estimate bandwidth?

■ Westwood■ Another bandwidth problem

■ BIC■ When does it make into binary mode?■ What is holding back window?

■ Netem■ Higher resolution? Packet groups?

Page 58: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 58 -

Break

Page 59: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 59 -

Agenda

■ Introduction■ TCP for muggles■ Engineering Process■ Problem examples■ Network Tools■ Wrapup

Page 60: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 60 -

Other tools

■ Information about■ ISP connection■ Sockets open

■ Testing infrastructure■ More data capture■ Monitoring

Page 61: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 61 -

Tools: basic

■ Network path information■ Ping – send icmp echo

■ Measure of round trip time and loss■ Can be blocked by firewall

■ Traceroute – use IP source routing■ Usually blocked now

■ Pathcapture (pcap)■ Bandwidth and delay measurement

Page 62: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 62 -

Tools: Network interface

■ ifconfig■ Basic statistics, packets sent/received/errors

■ ip -stats link■ Alternate newer, may have more info

■ SNMP■ Remote access to same information■ Slightly more work

Page 63: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 63 -

Tools: Sockets

■ Netstat■ TCP statistics■ Open sockets

■ Ss■ More statistics available (rtt, etc)

■ Recvmsg■ Application can see TCP info (cmsg)

Page 64: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 64 -

Tools: test servers

■ SYN testtelnet syntest.psc.edu 7960

■ TCP bandwidthhttp://www.epm.ornl.gov/~dunigan/java/misc/tcpbw.html

http://dslreports.com

■ ANL network confighttp://miranda.ctd.anl.gov:7123

■ Path MTUhttp://www.ncne.org/jumbogram/mtu_discovery.php

Page 65: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 65 -

Tools: testing

■ Ttcp■ Basic send /receive throughput

■ Iperf■ Longer running tests and turnaround

■ Netperf■ Includes cpu and other statistics

■ Dbs■ Multiclient testing

Page 66: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 66 -

Tools: monitoring

■ Ntop■ Measure of network activity by service■ Nice web interface

■ Mailgraph■ Long term mail statistics

■ Web server activity log analysis

Page 67: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 67 -

Tools: data capture

■ Tcpdump■ Filter packets by protocol, address, etc■ Decode many protcols

■ Ethereal■ GUI interface

■ RMON■ Remote monitoring

■ Kismet■ Wireless activity

Page 68: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 68 -

Tools: generators

■ Pktgen■ Kernel level packet generation■ Can generate maximum hardware packet rate

■ Network packet generator■ Application level

Page 69: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 69 -

Tools: simulation

■ Ns■ Describe overall system■ Event based simulation■ Used for protocol analysis

■ SSFnet■ More detailed models of real hardware

Page 70: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 70 -

Tools: client simulator

■ Web■ SPECweb, Apache (as), httpload

■ NFS■ Nfsstone

■ FTP■ Dkftpbench

Page 71: Linux TCP/IP Tuning

Copyright 2004 OSDL, All rights reserved. - 71 -

Conclusion

■ Data capture can provide clues of:■ Application problems■ Device problems■ TCP/IP problems

■ Nothing is ever simple


Recommended