ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester1
VLBI Data Transfer Tests
Recent and Current Work.
Richard Hughes-Jones The University of Manchester
www.hep.man.ac.uk/~rich/ then “Talks”
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester2
Outline
Throughput Tests on Mark5s TCP Memory-2-memory tests CPU Load tests
Data delay on a TCP link – How suitable is TCP? 4th Year MPhys Project
Stephen Kershaw & James Keenan The effect of distance
Throughput on the 630Mbit JB-JIVE UKLight Link TCP Performance
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester3
Why were Jodrell to JIVE VLBI data transfers not able to do 512 Mbit even on UKLight ?
Why can Onsala Mk5 achieve 512 Mbps to JIVE Mk5 ?Onsala can even high rates transatlantic – iGrid2005 SC|05? Identical Mk5 hardware to JBO Same kernel and drivers Longer links
Hint given as the general Network load increased: Normally Onsala – JIVE iperf TCP ~900-950 Mbit/s VLBI OK at 512 Mbit
Sometimes Onsala – JIVE iperf TCP ~750 Mbit/s VLBI not OK at 512 Mbit
Is it the network ?
Jodrell’s VLBI Mark5 Problem
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester4
VLBI Network Topology
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester5
VLBI Network Topology
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester6
Standard Mark5 PCs 1.2GHz PIII End host iperf TCP flow memory-to-memory only
960 Mbit/s with rtt 1 ms JBO - Manchester Falls to 770 Mbit/s when rtt 15 ms JBO - JIVE
JBO - Manchester 94.7% kernel mode idle 1.5 %
JBO - JIVE 96.3% kernel mode idle 0.05 %
No Loss No Timeouts
200* more TCPPureACK seen for JBO-Manchester
TCPHPACKs about the same Help with meanings please
TCP Tests Jodrell’s Mark5
mk5-606-jive_9Dec05
0102030405060708090
100
0 1 2 3 4 5trial
% C
PU
ker
nel
00.511.522.533.544.55
% C
PU
mod
e
kernel
user
nice
idle
mk5-606-g7_9Dec05
0102030405060708090
100
0 1 2 3 4 5trial
% C
PU
ker
nel
00.511.522.533.544.55
% C
PU
mod
e
kernel
user
nice
idle
mk5-606-jive_9Dec05
0100002000030000400005000060000700008000090000
100000
0 1 2 3 4 5trial
No. P
ure
AC
Ks
mk5-606-jive_9Dec05
mk5-606-g7_9Dec05
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester7
TCP Throughput & CPU Load Measure iperf TCP throughput and CPU load Run CPU intensive task with different priority (nice High number = low priority)
mk5-606-g7_10Dec05
0.0010.0020.0030.0040.0050.0060.0070.0080.0090.00
100.00
0 2 4 6 8 10 12 14 16 18 20nice large value - low priority
% C
PU
mo
de
se
nd
kernel
user
nice
idle
no CPU load
0
200
400
600
800
1000
0 2 4 6 8 10 12 14 16 18 20nice large value - low priority
Thr
ough
put
Mbi
t/s
no CPU load
JBO – Manchester 1.2 GHz PIII TCP Throughput falls as priority
increases
% Kernel mode drops and %nice increases ad priority increases
CPU mode shares with %nice
No Loss No Timeouts
JBO – Manchester Asus NCCH-DL2.8 GHz Xeon TCP Throughput constant as
priority increases
% Kernel and %nice constant
No Loss No Timeouts
0
200
400
600
800
1000
0 2 4 6 8 10 12 14 16 18 20nice large value - low priority
Thr
ough
put
Mbi
t/s
no CPU load
mk5-606-g7_17Jan05
0.0010.0020.0030.0040.0050.0060.0070.0080.0090.00
100.00
0 2 4 6 8 10 12 14 16 18 20nice large value - low priority
% C
PU
mo
de
se
nd
kernel
user
nice
idle
no CPU load
Onsala has a Faster Clock !
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester8
TCP Throughput while reading SuperStor Measure iperf TCP throughput while reading data from disk to memory
Reading SuperStor from disk to memory only 1.48 Gbit/s
Reading SuperStor with iperf 1.15 Gbit/s Iperf TCP rate 420 Mbit/s
15 ms SS read spacing~1Gbit/s to memory
Corresponding CPU load
mk5-606-g7_17Jan05
0100200300400500600700800900
1000
0 2 4 6 8 10 12 14 16 18 20nice large value - low priority
Thr
ough
put M
bit/s
15 ms SS read spacing
mk5-606-g7_17Jan05
0.0010.0020.0030.0040.0050.0060.0070.0080.0090.00
100.00
0 2 4 6 8 10 12 14 16 18 20Test number
% C
PU
mo
de
se
nd kernel
user
nice
idle
no CPU load
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester9
TCP Delay and VLBI Transfers
Manchester 4th Year MPhys Project
by
Stephen Kershaw & James Keenan
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester10
VLBI Application Protocol
VLBI data is Constant Bit Rate
tcpdelay instrumented TCP program emulates sending CBR
Data. Records relative 1-way delay
Data1
●●●
Timestamp1
Time
TCP & Network Receiver
Timestamp2
Sender
Data2Timestamp4
Timestamp5
Data4
Timestamp3
Data3
Packet loss
RTT
Time
Sender Receiver
ACKSegment time on wire = bits in segment/BW
Remember Bandwidth*Delay Product BDP = RTT*BW
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester11
Send time – 10000 packetsS
end
time
sec
1 sec
Check the Send Time
10,000 Messages Message size: 1448 B Wait time: 0 TCP buffer 64k
Slope 0.44 ms/message Expect 42 messages/rtt
~0.6ms/message
Message number
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester12
Message 102Message 76
About 25 us One rtt
100 ms
Sen
d tim
e se
c
26 messages
Send Time Detail
Message number
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester13
1 way delay – 10000 packets1
way
del
ay 1
00 m
s
Message number
1-Way Delay
10,000 Messages Message size: 1448 B Wait time: 0 TCP buffer 64k
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester14
Message number
= 1.5 x RTT
= 1 x RTT 26 ms
≠ 0.5 x RTT
1 w
ay d
elay
100
ms
1-Way Delay Detail
10,000 Messages Message size: 1448 B Wait time: 0 TCP buffer 64k
Why not 1 rtt? Why does it vary?
Effect of “send time delay”TCP slow start?
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester15
Message 102Message 76
100 ms
Sen
d tim
e se
c
26 messages
Comparison of Send Time & 1-way delay
Message number
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester16
1 way delay
μs
Packet number
1 way delay – 10000 packets
Packet 1214
1575 packets
~ 5.5 x RTT
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester17
10,000 Messages Message size: 724 Bytes Wait times: 20, 25, 30, 35,
40, 45 μs TCP buffer 64k
1 w
ay d
elay
100
ms
Message number
1-Way Delay 724 byte msg
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester18
Packet number
1-Way Delay 724 bytes Detail
10,000 Messages Message size: 724 Bytes Wait times: 20, 25, 30, 35,
40, 45 μs TCP buffer 64k
Regular cycle of ~125 packets
1 w
ay d
elay
100
ms
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester19
Route:Man-ukl-ams-prod-man
Rtt 27ms 10,000 Messages Message size: 1448 Bytes Wait times: 0 μs DBP = 3.4MByte TCP buffer 10MByte
1-Way Delay 1448 byte msgone-way
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
0 2000 4000 6000 8000 10000 12000Packet No.
1-w
ay d
elay
us
50 ms
Message number
0100
200300400
500600
700800
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
time ms
num
Pac
kets
0
500000
1000000
1500000
2000000
Cw
nd
P ktsOut (Delta)P ktsIn (Delta)CurCwnd (Value)
Web100 plot Starts after 5.6 Sec
due to Clock Sync. ~400 pkts/10ms Rate similar to iperf
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester20
5 ms
Message number
Route:LAN gig8-gig1
Ping 188us
10,000 Messages Message size: 1448 Bytes Wait times: 0 μs
Drop 1 in 1000
1-Way Delay with packet drop
800 us
28 ms
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester21
TCP on the 630 Mbit Link
Jodrell – UKLight – JIVE
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester22
TCP Throughput on 630 Mbit UKLight Manchester gig7 – JBO 606 4 Mbyte TCP buffer
test 0 Dup ACKs seen Other Reductions
test 1
test 2
0
200
400
600
800
1000
0 20 40 60 80 100 120
time s
TC
PA
chiv
e M
bit
/s
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Cw
nd
InstaneousBWCurCwnd (Value)
0
200
400
600
800
1000
0 20 40 60 80 100 120
time s
TC
PA
chiv
e M
bit
/s
050000010000001500000200000025000003000000350000040000004500000
Cw
nd
InstaneousBW CurCwnd (Value
0
200
400
600
800
1000
0 20 40 60 80 100 120
time s
TC
PA
chiv
e M
bit
/s
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Cw
nd
InstaneousBW CurCwnd (Value
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester23
Any Questions?
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester24
More Information Some URLs 1 UKLight web site: http://www.uklight.ac.uk MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup:
http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests:
http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/
TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html
TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004
PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ Dante PERT http://www.geant2.net/server/show/nav.00d00h002
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester25
Lectures, tutorials etc. on TCP/IP: www.nv.cc.va.us/home/joney/tcp_ip.htm www.cs.pdx.edu/~jrb/tcpip.lectures.html www.raleigh.ibm.com/cgi-bin/bookmgr/BOOKS/EZ306200/CCONTENTS www.cisco.com/univercd/cc/td/doc/product/iaabu/centri4/user/scf4ap1.htm www.cis.ohio-state.edu/htbin/rfc/rfc1180.html www.jbmelectronics.com/tcp.htm
Encylopaedia http://www.freesoft.org/CIE/index.htm
TCP/IP Resources www.private.org.il/tcpip_rl.html
Understanding IP addresses http://www.3com.com/solutions/en_US/ncs/501302.html
Configuring TCP (RFC 1122) ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt
Assigned protocols, ports etc (RFC 1010) http://www.es.net/pub/rfcs/rfc1010.txt & /etc/protocols
More Information Some URLs 2
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester26
Backup Slides
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester27
UDP/IP packets sent between back-to-back systems Processed in a similar manner to TCP/IP Not subject to flow control & congestion avoidance algorithms Used UDPmon test program
Latency Round trip times measured using Request-Response UDP frames Latency as a function of frame size
Slope is given by:
Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s)
Intercept indicates: processing times + HW latencies Histograms of ‘singleton’ measurements Tells us about:
Behavior of the IP stack The way the HW operates Interrupt coalescence
pathsdata dt
db1 s
Latency Measurements
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester28
Throughput Measurements
UDP Throughput Send a controlled stream of UDP frames spaced at regular intervals
n bytes
Number of packets
Wait timetime
Zero stats OK done
●●●
Get remote statistics Send statistics:No. receivedNo. lost + loss patternNo. out-of-orderCPU load & no. int1-way delay
Send data frames at regular intervals
●●●
Time to send Time to receive
Inter-packet time(Histogram)
Signal end of testOK done
Time
Sender Receiver
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester29
PCI Bus & Gigabit Ethernet Activity
PCI Activity Logic Analyzer with
PCI Probe cards in sending PC Gigabit Ethernet Fiber Probe Card PCI Probe cards in receiving PC
GigabitEthernetProbe
CPU
mem
chipset
NIC
CPU
mem
NIC
chipset
Logic AnalyserDisplay
PCI bus PCI bus
Possible Bottlenecks
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester30
SuperMicro P4DP8-2G (P4DP6) Dual Xeon 400/522 MHz Front side bus
6 PCI PCI-X slots 4 independent PCI buses
64 bit 66 MHz PCI 100 MHz PCI-X 133 MHz PCI-X
Dual Gigabit Ethernet Adaptec AIC-7899W
dual channel SCSI UDMA/100 bus master/EIDE channels
data transfer rates of 100 MB/sec burst
“Server Quality” Motherboards
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester31
“Server Quality” Motherboards
Boston/Supermicro H8DAR Two Dual Core Opterons 200 MHz DDR Memory
Theory BW: 6.4Gbit
HyperTransport
2 independent PCI buses 133 MHz PCI-X
2 Gigabit Ethernet SATA
( PCI-e )
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester32
Network switch limits behaviour End2end UDP packets from udpmon
Only 700 Mbit/s throughput
Lots of packet loss
Packet loss distributionshows throughput limited
w05gva-gig6_29May04_UDP
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40Spacing between frames us
Recv W
ire r
ate
Mb
its/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
w05gva-gig6_29May04_UDP
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40Spacing between frames us
% P
acket
loss
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
w05gva-gig6_29May04_UDP wait 12us
0
2000
4000
6000
8000
10000
12000
14000
0 100 200 300 400 500 600Packet No.
1-w
ay d
ela
y u
s
0
2000
4000
6000
8000
10000
12000
14000
500 510 520 530 540 550Packet No.
1-w
ay d
ela
y u
s
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester33
10 Gigabit Ethernet: UDP Throughput
1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s
CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s
SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s
an-al 10GE Xsum 512kbuf MTU16114 27Oct03
0
1000
2000
3000
4000
5000
6000
0 5 10 15 20 25 30 35 40Spacing between frames us
Rec
v W
ire
rate
Mb
its/
s
16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester34
10 Gigabit Ethernet: Tuning PCI-X
16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc
Measured times Times based on PCI-X times from
the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s
mmrbc1024 bytes
mmrbc2048 bytes
mmrbc4096 bytes5.7Gbit/s
mmrbc512 bytes
CSR Access
PCI-X Sequence
Data Transfer
Interrupt & CSR UpdateKernel 2.6.1#17 HP Itanium Intel10GE Feb04
0
2
4
6
8
10
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e
us
measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X
DataTAG Xeon 2.2 GHz
0
2
4
6
8
10
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e
us
measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester35
Tests on the UKLight switched light-path Manchester : Dwingeloo
Throughput as a function of inter-packet spacing (2.4 GHz dual Xeon machines)
Packet loss for small packet size Maximum size packets can reach full
line rates with no loss, and there was no re-ordering (plot not shown).
gig03-jiveg1_UKL_25Jun05
0100200300400500600700800900
1000
0 10 20 30 40Spacing between frames us
Recv W
ire r
ate
Mbit/s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
gig03-jiveg1_UKL_25Jun05
0.0001
0.001
0.01
0.1
1
10
100
0 10 20 30 40Spacing between frames us
% P
acket
loss
50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
ESLEA VLBI Bits&Bytes Workshop , 4-5 May 2006, R. Hughes-Jones Manchester36
UKLight using Mk5 recording terminals