Network performance lessons from the coal face - Networkshop44

Post on 14-Jan-2017

1,441 views 1 download

transcript

Networkshop March 2016

Network performance: lessons from the coal face

Chris Walker

20/03/16

Achieving high performance

20/03/16 Networkshop March 2016 2

What can be achieved?

How we achieved it–

Architecture Network tuning

TCP tuning Parallel transfers Multiple streams

Monitoring

Bottlenecks– Found and fixed

● Conclusions

Motivation (LHC @CERN)

● Collisions 25ns– 100 PB/year

● QMUL– Small fraction

20/03/16 Networkshop March 2016 3

Network

20/03/16 Networkshop March 2016 4

● 1 Terabyte can be transferred in:–

100 Mbps network : 30 hrs1 Gbps network : 3 hrs10 Gbps network : 20 minutes

● Takes work to achieve this in practice–

TCP tuningFind and eliminate bottlenecks Reduce packet loss

● Fasterdata.es.net– Excellent source of information

LAN topology

20/03/16 Networkshop March 2016 5

● 2 * Data transfer nodes (SE) connected at 10Gbit/s–

Optimised for WAN transfers Fast (lustre) filesystem

● Network:–

2 * 20Gbit/s WAN links – HEP use onePreviously HEP 1 Gbit dedicated + 80% of resilient link

20/03/16 Networkshop March 2016 6

WAN performance

● April 2012: 1 Gbit dedicate link (Saturated)– Source based routing (+ 80% of resilient link)

● Sept 2013: 2* 10 Gig link (1 used by HEP)– 1*10Gig used by High Energy Physics (HEP)

April 2012

Feb 2013

20/03/16 6

20/03/16 Networkshop March 2016 7

WLCG World sites

20/03/16 7

Data Transferred● QMUL (2012)

2.6PB downloaded (3.9 million files)

1.4PB uploaded 870MB/s peak rate380MB/s average on busy days

● Atlas– 1PB in 1 week

(October 2012?) worldwide!!!

20/03/16 Networkshop March 2016 8

Networkshop March 2016

How TCP works: A very short overview● Congestion window (CWND) = the number of packets the sender is

allowed to send–

The larger the window size, the higher the throughput Throughput = Window size / Round-trip Time

● TCP Slow start–

exponentially increase the congestion window size until a packet is lost this gets a rough estimate of the optimal congestion window size

CWND slow start:

exponential

increase

congestionavoidance: linear increase

packet loss

time

retransmit: slow start again

timeout

20/03/16 9

TCP Tuning

Latency: time to send 1 packet from the source to the destination

RTT: Round-trip time

Bandwidth*Delay = Bandwidth Delay Product

The number of bytes in flight to fill the entire path

Example: 10 Gbps path; ping shows a 90 ms RTT (QMUL->BNL)● BDP = 10 * 0.090 = 0.9 Gbits (112 MBytes)

– QMUL ->Taiwan 273ms RTT (at 10Gbps path)● BDP = 10*0.273 = 2.73 Gbits (340 MBytes)

20/03/16 10

Networkshop March 2016

Effect of packet loss with distance

● From fasterdata.es.net20/03/16 Networkshop March 2016 11

Multiple streams

● Parallel streams can help– Potentially unfair on other users

20/03/16 Networkshop March 2016 12

TCP lessons

20/03/16 Networkshop March 2016

● Increase TCP buffers for distant transfers– Fasterdata.es.net has good recommendations

Packet loss needs eliminating

Application–

large buffers (not scp) Multiple streams GridFTP has these

Aspera uses UDP (and GridFTP can)

Fasterdata.es.net has excellent recommendations

13

Bottlenecks found

● Gbit connected at 100Mbit–

GridFTP node DeptCollegeIperf tests with another Uni●

1 min CPU limit

2* 1Gbit hashing– Can also cause packet loss 1 2 3 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Layer 3+4Layer 2 HP switch

20/03/16 Networkshop March 2016

Network card

Pro

porti

on o

f tra

f ic

14

Routing

20/03/16 Networkshop March 2016

● Linux Layer 2– 1 Gig (not 10Gig)

● 12 *1G → 1G● Router Flapping

– Route 1Gbit or advertise routes● To CERN via the

US

15

Routing problems with 10Gbit/s upgrade

● 4th September 10Gbit/s WAN upgrade UK sites – increased ratesASGC (Taiwan) decrease–

Route not advertised via GEANT.

16

Firewalls

20/03/16 Networkshop March 2016

● ICMP– Often blocked

● timeout rather than failure● IPv6

– Tracepath6 blocked (ICMP blocking )● Barclays bank blocked

– Deep packet inspection and rewriting of http packets (but not https)●

Scp failing half way through transfer

GridFTP Slow performance– 1 MB/s through firewall, 50MB/s avoiding firewall

● GridFTP control connection forgotten

17

IPv6

20/03/16 Networkshop March 2016

● Routes– May be different to IPv4

● Geneva ->QMUL via New York (fixed)● Software (IPv6) / ASIC (IPv4)

Older routers may give poor performance (see perfsonar talk)

Preferred over IPv4–

If IPv6 address (AAAA record) in DNS, it will be used by machines that think they are IPv6 connected.

Blocked differently by firewalls

18

Jumbo Frames

20/03/16 Networkshop March 2016

● Ethernet–

MTU =1500 - normal MTU=9000 Jumbo (convention)

● Janet network supposed to allow this–

Only for the brave at presentEncapsulation If site uses MTU=9000 jumbo frames, fragmented over Janet

● Path MTU discovery–

Sometimes blocked by firewallsMore likely to be dropped (misconfigured switches etc) net.ipv4.tcp_mtu_probing=1

19

Networkshop March 2016

Network monitoring (perfsonar +ripe ATLAS)

● Cacti–

Monitor packet loss 64 bit counters

● Perfsonar–

Bandwidth LatencyReverse traceroute● Ripe ATLAS probe

Atlas.ripe.net Latencytraceroute

20/03/16 20

Bufferbloat Www.bufferbloat.net

20/03/16 Networkshop March 2016

Chaotic and laggy network performance

Buffers too big for bandwidth

Affects home users on low bandwidth links with big buffers– Packet loss signalling bandwidth limit too late

21

Conclusions

20/03/16 Networkshop March 2016

● Large transfers routine–

But take work (GridPP sites have this experience) Needs management layer

● Monitoring vital–

Transfers Network

● Network–

Low packet lossGood relationship with network team useful

● Information– Fasterdata.es.net

22

Acknowledgements

20/03/16 Networkshop March 2016

● Fasterdata.es.net (Brian Tierney)– Much thanks for the TCP tuning slides

Duncan Rand

Brian Davies

Dan Traynor

Terry Froy

23

jisc.ac.uk

20/03/16 Networkshop March 2016

Christopher Walker

C.J.Walker@qmul.ac.uk

24