+ All Categories
Home > Documents > Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh...

Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh...

Date post: 21-Jan-2016
Category:
Upload: randall-johns
View: 228 times
Download: 0 times
Share this document with a friend
50
Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov
Transcript
Page 1: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Memory and network stack tuning in Linux:

the story of highly loaded servers migration to fresh Linux distribution

Dmitry Samsonov

Page 2: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Dmitry SamsonovLead System Administrator at Odnoklassniki

Expertise:

● Zabbix

● CFEngine

● Linux tuning

[email protected]

https://www.linkedin.com/in/dmitrysamsonov

Page 3: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

OpenSuSE 10.2Release: 07.12.2006End of life: 30.11.2008

CentOS 7Release: 07.07.2014End of life: 30.06.2024

Page 4: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Video distribution

4 x 10Gbit/s to users

2 x 10Gbit/s to storage

256GB RAM — in-memory cache

22 х 480GB SSD — SSD cache

2 х E5-2690 v2

Page 5: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

TOC● Memory

○ OOM killer

○ Swap

● Network

○ Broken pipe

○ Network load distribution between CPU cores

○ SoftIRQ

Page 6: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Memory

OOM killer

Page 7: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

NODE 0 (CPU N0)

1. All the physical memory

NODE 1 (CPU N1)

ZONE_DMA (0-16MB)

ZONE_DMA32 (0-4GB)

ZONE_NORMAL (4+GB)

2. NODE 0 (only)

3. Each zone

20*PAGE_SIZE

21*PAGE_SIZE

22*PAGE_SIZE

23*PAGE_SIZE

24*PAGE_SIZE

25*PAGE_SIZE

26*PAGE_SIZE

27*PAGE_SIZE ...

28*PAGE_SIZE ...

29*PAGE_SIZE ...

210*PAGE_SIZE ...

Page 8: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

What is going on?OOM killer, system CPU spikes!

Page 9: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Memory fragmentationMemory after server has booted up

After some time

After some more time

Page 10: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Why this is happening?

• Lack of free memory

• Memory pressure

Page 11: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

What to do with fragmentation?

Increase vm.min_free_kbytes!

High/low/min watermark.

/proc/zoneinfo

Node 0, zone Normal pages free 2020543 min 1297238 low 1621547 high 1945857

Page 12: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Current fragmentation status

/proc/buddyinfo

Node 0, zone DMA 0 0 1 0 ...Node 0, zone DMA32 1147 980 813 450 ...Node 0, zone Normal 55014 15311 1173 120 ... Node 1, zone Normal 70581 15309 2604 200 ...

... 2 1 1 0 1 1 3

... 386 115 32 14 2 3 5

... 5 0 0 0 0 0 0

... 32 0 0 0 0 0 0

Page 13: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Why is it bad to increase min_free_kbytes?

Part of the memory min_free_kbytes-sized will not be available.

Page 14: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Memory

Swap

Page 15: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

40GB of free memory and vm.swappiness=0, but server is still

swapping!

What is going on?

Page 16: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

NODE 0 (CPU N0)

1. All the physical memory

NODE 1 (CPU N1)

ZONE_DMA (0-16MB)

ZONE_DMA32 (0-4GB)

ZONE_NORMAL (4+GB)

2. NODE 0 (only)

3. Each zone

20*PAGE_SIZE

21*PAGE_SIZE

22*PAGE_SIZE

23*PAGE_SIZE

24*PAGE_SIZE

25*PAGE_SIZE

26*PAGE_SIZE

27*PAGE_SIZE ...

28*PAGE_SIZE ...

29*PAGE_SIZE ...

210*PAGE_SIZE ...

Page 17: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Uneven memory usage between nodes

NODE 0

(CPU N0)

NODE 1

(CPU N1)

Free

Used

Free

Used

Page 18: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

numastat -m <PID>numastat -m

Node 0 Node 1 Total --------------- --------------- ---------------MemFree 51707.00 23323.77 75030.77...

Current usage by nodes

Page 19: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

What to do with NUMA? Turn off NUMA

• For the whole system (kernel parameter):

numa=off

• Per process:

numactl —interleave=all <cmd>

Prepare application

• Multithreading in all parts

• Node affinity

Page 20: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Network

Page 21: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

What already had to be done

Ring buffer: ethtool -g/-G

Transmit queue length: ip link/ip link set <DEV> txqueuelen <PACKETS>

Receive queue length: net.core.netdev_max_backlog

Socket buffer: net.core.<rmem_default|rmem_max>net.core.<wmem_default|wmem_max>net.ipv4.<tcp_rmem|udp_rmem>net.ipv4.<tcp_wmem|udp_wmem>net.ipv4.udp_mem

Offload: ethtool -k/-K

Page 22: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

NetworkBroken pipe

Page 23: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Broken pipe errors backgroundIn tcpdump - half-duplex close sequence.

What is going on?

Page 24: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

OOOOut-of-order packet, i.e. packet with incorrect SEQuence number.

Page 25: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

What to do with OOO?One connection packets by one route:

• Same CPU core

• Same network interface

• Same NIC queue

Configuration:

• Bind threads/processes to CPU cores

• Bind NIC queues to CPU cores

• Use RFS

Page 26: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Before/afterBroken pipes per second per server

Page 27: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Why is static binding bad?

Load distribution between CPU cores might be uneven

Page 28: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Network

Network load distribution between CPU cores

Page 29: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.
Page 30: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

CPU0 utilization at 100%

Page 31: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Why this is happening?

1. Single queue - turn on more: ethtool -l/-L

2. Interrupts are not distributed:

○ dynamic distribution - launch irqbalance/irqd/birq

○ static distribution - configure RSS

Page 32: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

RSS

CPU

RSS

Network

eth0

Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 33: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

CPU0-CPU7 utilizationat 100%

Page 34: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

We need more queues!

Page 35: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

16 core utilizationat 100%

Page 36: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

scaling.txtRPS = Software RSS

XPS = RPS for outgoing packets

RFS? Use packet consumer core number

https://www.kernel.org/doc/Documentation/networking/scaling.txt

Page 37: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

1. Load distribution between CPU cores might be uneven.

2. CPU overhead

Why is RPS/RFS/XPS bad?

Page 38: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Accelerated RFS

Mellanox supports it, but after switching it on maximal throughput on 10G NICs were only 5Gbit/s.

Page 39: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Intel

Signature Filter (also known as ATR - Application Targeted Receive)

RPS+RFS counterpart

Page 40: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

NetworkSoftIRQ

Page 41: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

How SoftIRQs are born

Network

eth0

Q0 Q...

Page 42: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

How SoftIRQs are born

Network

eth0

Q0 Q...

CPU

C0 C...HW IRQ 42

RSS

Page 43: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

How SoftIRQs are born

Network

eth0

Q0 Q...

CPU

C0 C...HW IRQ 42

SoftIRQNET_RXCPU0

RSS

HW interrupt processing is finished

Page 44: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

How SoftIRQs are born

Network

eth0

Q0 Q...

CPU

C0 C...HW IRQ 42

SoftIRQNET_RXCPU0

RSS

NAPI poll

HW interrupt processing is finished

Page 45: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

What to do with high SoftIRQ?

Page 46: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Interrupt moderationethtool -c/-C

Page 47: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Why is interrupt moderation bad?

You have to balance between throughput and latency

Page 48: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

What is going on?Too rapid growth

Page 49: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Health ministry is warning!

CHANGES

TESTS

REVERT!

KEEP IT

Page 50: Memory and network stack tuning in Linux: the story of highly loaded servers migration to fresh Linux distribution Dmitry Samsonov.

Thank you!● Odnoklassniki technical blog on habrahabr.ru

http://habrahabr.ru/company/odnoklassniki/● More about us

http://v.ok.ru/

Dmitry [email protected]


Recommended