+ All Categories
Home > Documents > Network Performance Optimisation and Load Balancing

Network Performance Optimisation and Load Balancing

Date post: 16-Mar-2016
Category:
Upload: ewan
View: 50 times
Download: 1 times
Share this document with a friend
Description:
Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser. Network Performance Optimisation. 60 SFCs with ca. 16 “off the shelf” PCs each. Network Optimisation: Where?. LHC-B Detector. Data rates. VDET TRACK ECAL HCAL MUON RICH. 40 MHz. Level 0 Trigger. - PowerPoint PPT Presentation
22
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser
Transcript
Page 1: Network Performance Optimisation and Load Balancing

1

Network Performance Optimisationand

Load Balancing

Wulf Thannhaeuser

Page 2: Network Performance Optimisation and Load Balancing

2

Network Performance Optimisation

Page 3: Network Performance Optimisation and Load Balancing

3

Network Optimisation: Where?

Read-out Network (RN)

RU RU

Control &

Monitoring

RU

2-4 GB/s

4 GB/s

20 MB/s

LA

N

Read-out units (RU)

Timing&

FastControl

Front-End Electronics

VDET TRACK ECAL HCAL MUON RICHLHC-B Detector

L0

L1

Level 0Trigger

Level 1Trigger

40 MHz

1 MHz

40 kHz

Fixed latency 4.0 s

Variable latency <1 ms

Datarates

40 TB/s

1 TB/s

Front-End Multiplexers (FEM)1 MHzFront End Links

Trigger Level 2 & 3Event Filter

SFC SFC

CPU

CPU

CPU

CPU

Sub-Farm Controllers (SFC)

Storage

Thr

ottle

60 SFCs with ca.16 “off the shelf” PCs each

Page 4: Network Performance Optimisation and Load Balancing

4

Network Optimisation: Why?

Ethernet Speed: 10 Mb/sFast Ethernet Speed: 100 Mb/s

(Fast) EthernetGigabit Ethernet

Gigabit Ethernet Speed: 1000 Mb/s(considering full-duplex: 2000 Mb/s)

Page 5: Network Performance Optimisation and Load Balancing

5

Network Optimisation: Why?An “average” CPU might not be able to process such a huge amount of data packets per second:-TCP/IP Overhead-Context Switching-Packet ChecksumsAn “average” PCI Bus is33 MHz, 32-bit wide.Theory: 1056 Mbit/sActually: ca. 850 Mbit/s(PCI overhead, burstsize)

Page 6: Network Performance Optimisation and Load Balancing

6

Network Optimisation: How?An “average” CPU might not be able to process such a huge amount of data packets per second:-TCP/IP Overhead-Context Switching-Packet ChecksumsAn “average” PCI Bus is33 MHz, 32-bit wide.Theory: 1056 Mbit/sActually: ca. 850 Mbit/s(PCI overhead, burstsize)

Reduce per packet Overhead: Replace TCP with UDP

Page 7: Network Performance Optimisation and Load Balancing

7

TCP / UDP Comparison

• TCP (Transfer Control Protocol):- connection-oriented protocol- full-duplex- messages received in order, no loss or duplication reliable but with overheads

• UDP (User Datagram Protocol):- messages called “datagrams”- messages may be lost or duplicated - messages may be received out of order unreliable but potentially faster

Page 8: Network Performance Optimisation and Load Balancing

8

Network Optimisation: How?An “average” CPU might not be able to process such a huge amount of data packets per second:-TCP/IP Overhead-Context Switching-Packet ChecksumsAn “average” PCI Bus is33 MHz, 32-bit wide.Theory: 1056 Mbit/sActually: ca. 850 Mbit/s(PCI overhead, burstsize)

Reduce per packet Overhead: Replace TCP with UDP

Reduce number of packets: Jumbo Frames

Page 9: Network Performance Optimisation and Load Balancing

9

Jumbo FramesNormal Ethernet Maximum Transmission Unit (MTU): 1500 bytes

Ethernet with Jumbo Frames MTU: 9000 bytes

Page 10: Network Performance Optimisation and Load Balancing

10

Test set-up

• Netperf is a benchmark for measuring network performance

• The systems tested were 800 and 1800 MHz Pentium PCs using (optical as well as copper) Gbit Ethernet NICs.

• The network set-up was always a simple point-to-point connection with a crossed twisted pair or optical cable.

• Results were not always symmetric:With two PCs of different performance, the benchmark results were usually better if data was sent from the slow PC to the fast PC, i.e. the receiving process is more expensive.

Page 11: Network Performance Optimisation and Load Balancing

11

Results with the optimisations so far

Frame Size tuning: throughput

597.67

704.94648.53

856.52

647.76

854.52

0

100

200

300

400

500

600

700

800

900

1500 9000

MTU

Thro

ughp

ut (M

bit/s

)

TCP Throughpt (Mbit/s) UDP Send perf (Mbit/s) UDP Rcv perf (Mbit/s)

Page 12: Network Performance Optimisation and Load Balancing

12

Network Optimisation: How?An “average” CPU might not be able to process such a huge amount of data packets per second:-TCP/IP Overhead-Context Switching-Packet ChecksumsAn “average” PCI Bus is33 MHz, 32-bit wide.Theory: 1056 Mbit/sActually: ca. 850 Mbit/s(PCI overhead, burstsize)

Reduce per packet Overhead: Replace TCP with UDP

Reduce number of packets: Jumbo Frames

Reduce context switches: Interrupt coalescence

Page 13: Network Performance Optimisation and Load Balancing

13Interrupt

Interrupt CoalescencePacket Processing without Interrupt coalescence:

Memory

InterruptInterruptInterrupt

NIC

CPU

NICMemory

Packet Processing with Interrupt coalescence:CPU

Page 14: Network Performance Optimisation and Load Balancing

14

Interrupt Coalescence: ResultsInterrupt coalescence tuning: throughput

(Jumbo Frames deactivated)

526.09

633.72587.43

556.76

666.43

593.94

0

100

200

300

400

500

600

700

32 1024

us waited for new packets to arrive before interrupting CPU

Thro

ughp

ut (M

bit/s

)

TCP Throughpt (Mbit/s) UDP Send perf (Mbit/s) UDP Rcv perf (Mbit/s)

Page 15: Network Performance Optimisation and Load Balancing

15

Network Optimisation: How?An “average” CPU might not be able to process such a huge amount of data packets per second:-TCP /IP Overhead-Context Switching-Packet ChecksumsAn “average” PCI Bus is33 MHz, 32-bit wide.Theory: 1056 Mbit/sActually: ca. 850 Mbit/s(PCI overhead, burstsize)

Reduce per packet Overhead: Replace TCP with UDP

Reduce number of packets: Jumbo Frames

Reduce context switches: Interrupt coalescence

Reduce context switches: Checksum Offloading

Page 16: Network Performance Optimisation and Load Balancing

16

Checksum Offloading• A checksum is a number calculated from the

data transmitted and attached to the tail of each TCP/IP packet.

• Usually the CPU has to recalculate the checksum for each received TCP/IP packet in order to compare it with the checksum in the tail of the packet to detect transmission errors.

• With checksum offloading, the NIC performs this task. Therefore the CPU does not have to calculate the checksum and can perform other operations in the meanwhile.

Page 17: Network Performance Optimisation and Load Balancing

17

Network Optimisation: How?An “average” CPU might not be able to process such a huge amount of data packets per second:-TCP/IP Overhead-Context Switching-Packet ChecksumsAn “average” PCI Bus is33 MHz, 32-bit wide.Theory: 1056 Mbit/sActually: ca. 850 Mbit/s(PCI overhead, burstsize)

Reduce per packet Overhead: Replace TCP with UDP

Reduce number of packets: Jumbo Frames

Reduce context switches: Interrupt coalescence

Reduce context switches: Checksum Offloading

Or buy a faster PC with a better PCI bus…

Page 18: Network Performance Optimisation and Load Balancing

18

Load Balancing

Page 19: Network Performance Optimisation and Load Balancing

19

Load Balancing: Where?

Read-out Network (RN)

RU RU

Control &

Monitoring

RU

2-4 GB/s

4 GB/s

20 MB/s

LA

N

Read-out units (RU)

Timing&

FastControl

Front-End Electronics

VDET TRACK ECAL HCAL MUON RICHLHC-B Detector

L0

L1

Level 0Trigger

Level 1Trigger

40 MHz

1 MHz

40 kHz

Fixed latency 4.0 s

Variable latency <1 ms

Datarates

40 TB/s

1 TB/s

Front-End Multiplexers (FEM)1 MHzFront End Links

Trigger Level 2 & 3Event Filter

SFC SFC

CPU

CPU

CPU

CPU

Sub-Farm Controllers (SFC)

Storage

Thr

ottle

60 SFCs with ca.16 “off the shelf” PCs each

Page 20: Network Performance Optimisation and Load Balancing

20

Fast Ethernet

Gigabit Ethernet

…SFC

21 3 4

Load Balancing with round-robin

Event

Event

EventEventEventEventEvent

Event EventEventEventEvent

Problem: The SFC doesn’t know if the node it wants to send the event to is ready to process it yet.

Page 21: Network Performance Optimisation and Load Balancing

21

Fast Ethernet

Gigabit Ethernet

…SFC

21 3 4

Load Balancing with control-tokens

Event

Event

EventEventEvent

Event EventEvent

With control tokens, nodes who are ready send a token,and every event is forwarded to the sender of a token.

Token

2 1 3 1

Token TokenToken

Page 22: Network Performance Optimisation and Load Balancing

LHC Comp. Grid Testbed Structure100 cpu servers on GE, 300 on FE, 100 disk servers on GE (~50TB), 10 tape server on GE

3 GB lines

3 GB lines

8 GB lines

64 disk serverBackboneRouters

36 disk server

10 tape server

100 GE cpu server

200 FE cpu server

100 FE cpu server

1 GB lines

SFC


Recommended