Date post: | 15-Jan-2016 |
Category: |
Documents |
View: | 229 times |
Download: | 0 times |
Research
Rutherford ApparentNetworks
Size Matters:Performance Benefits (and Obstacles)
of Jumbo Packets
Research Rutherford Apparent
Networks
9k MTU Project• test global path MTU on Abilene, CA*net4, CUDI and other R & E
networks, plus create a useful researcher mapping tool• Internet2 ATEAM - Advanced Test Engineering and Measurement
• www.ateam.info • Bill Rutherford (Rutherford Research/GAIT – Project Leader)• Kevin Walsh, Nathaniel Mendoza (San Diego Supercomputing
Center/SDSC)• John Moore (Centaur Internet2 Technology Evaluation Center
ITEC/NCSU North Carolina State University)• Loki Jorgenson (Apparent Networks/SFU)• Paul Schopis (Internet2 Technology Evaluation Center/ITEC-Ohio/
OARnet)• Jorge Hernandez Serran (CUDI2/UNAM Mexico)• Dave Hartzell (NASA Ames Research Center)• Bill Jones (University of Texas Austin)• Woojin Seok (Supercomputing Center Korea/KISTI)
Research Rutherford Apparent
Networks
9k MTU Project• Preliminary project
flow• Several Internet2 Joint
Techs presentations• Participation in HEP
TRIUMF to CERN test run (Corrie Kost, Steven McDonald)
• Collaboration with equipment vendors
• Comprehensive testing on Abilene and CA*net4
• First international 9k connection between I2 and C4 via StarLight
• Academic network and mapping system
9k MTU Project
CreateProject Plan
Plan
Formulate 9kMTU Interesting
Target List
Probe AbileneTargets forBasic MTU
Capabilities
Abilene TargetList
Wednesday, July 17, 2002
AbileneProbe data
Friday, July 19, 2002
Kevin Walsh/Bill Rutherford
Paul Love/Kevin Walsh
Bill Rutherford/Nathaniel Mendoza/Loki Jorgenson/Kevin Walsh/PaulSchopis
FormulatePresentationbased on combined
Probe Data
Joint TechsPresentation
July 28- August 1, 2002
Formulate MTUTest SW for
SpirentSM6000B
Procedure/SW
Kevin Walsh/Fred Klassen
All
DetailedAnalysis of MTU
Capabilities
ExperimentResults
FundingApplication forFurther Work
Application
All
All
September 2002
October 2002
October 2002
Global9k MTU
Route Map
Map All
March 2004
Probe TRIUMFto CERN for MTU
Capabilities
HEP Probe data
Friday, July 26, 2002
Bill Rutherford/Loki Jorgenson/Steven McDonald
done
done
done
done
in progress -late
waiting for 9k MTU config modsat all sites
preliminary discussion
1
2
3
4
5
6
7
8
9
doneas of Sept 30/02
preliminary discussion
in progress -late
in progress -late
preliminary 9k MTU testingcompleted on CA*net 4
as of Dec 30/02
preliminary discussion
pending fundingarrangement
Research Rutherford Apparent
Networks
9k MTU Project
• Contributions• Matt Mathis (Pittsburg Supercomputing Center)
• Theoretical considerations MTU role in TCP• http://www.psc.edu/~mathis/MTU
• Joe St. Sauver (University of Oregon)• Practical MTU considerations for campus and equipment
issues• http://darkwing.uoregon.edu/~joe/jumbos/jumbo-frames.ppt
• Phillip Dykstra (Chief Scientist, WareOnEarth Communications Inc.)• MTU related network tuning issues• http://sd.wareonearth.com/woe/Briefings/tcptune.ppt
• Bryan Caron (Network Manager Subatomic Physics, University of Alberta)• CA*net4 testing• http://www.phys.ualberta.ca/~caron/
Research Rutherford Apparent
Networks
9k MTU Project - Tools and Equipment • NLANR Iperf
• http://dast.nlanr.net/Projects/Iperf• tool to measure maximum TCP
bandwidth• reports bandwidth, delay, jitter,
datagram loss
• Apparent Networks AppareNet network intelligence system
• http://www.apparentNetworks.com
• Spirent Communications SmartBits 6000 series network analyzer
• http://www.spirentcom.com• automated testing from scripts• high level of accuracy
Research Rutherford Apparent
Networks
Why Jumbo?
Performance
• Benefits for high performance transfers• High Energy Physics – TRIUMF to CERN test run• National Light Rails/Paths• Grid Networks/Next Generation Clusters• Meteorology / Astrophysics / Bioinformatics• Collaborative/interactive/video – access grid
• End-to-end path• From NIC-to-NIC MTU requirement• End station is typically the bottleneck• Advent of Gig-E to the desktop
Research Rutherford Apparent
Networks
TCP Steady State
• If TCP window size and network capacity are not rate limiting factors then (roughly):
0.7 * Max Segment Size (MTU)e2e throughput <
Round Trip Time (latency) sqrt[loss]
M. Mathis, et.al.
• Double the MSS, double the throughput• Halve the latency, double the throughput (shortest path matters)• Halve the loss rate, 40% higher throughput
Research Rutherford Apparent
Networks
Frame Size vs. MTU vs. MSS – Ethernet Example
PREMAC/LLC
IP Header TCP Header Payload Data FCSIFG PREMAC/LLC
IP Header TCP Header Payload Data FCSIFG
OSILayer
Description
7 Application6 Presentation5 Session4 Transport3 Network2 Data Link1 Physical
PREMAC/LLC
IP Header TCP Header Payload Data FCSIFG
MSS(1460bytes)
Maximum Segment Size (MSS)
PREMAC/LLC
IP Header TCP Header Payload Data FCSIFG
Packet (1500 bytes = MTU)
Maximum Transmission Unit (MTU) = Packet
PREMAC/LLC
IP Header TCP Header Payload Data FCSIFG
Frame (1518 bytes)
Frame
Research Rutherford Apparent
Networks
Abilene Results: iPerf NCSU SDSC
512 1500 2048 3072 4096 5120 6144 71688192
SDSC->NC
0
200
400
600
800
1000
1200
Bandwidth vs MTU 2-way Mbps
SDSC->NC
NC->SDSC
512 1500 2048 3072 4096 5120 6144 71688192
SDSC->NC
0
200
400
600
800
1000
1200
Bandwidth vs MTU 2-way Mbps
SDSC->NC
NC->SDSC
Research Rutherford Apparent
Networks
About aNA
• appareNet Network for Academics
• Currently 16 sequencers across CA*net and Abilene
• NIS in Vancouver, Canada• 10 Gig-E/Jumbo hosts
• 4 nodes in Canada• BCNET• Netera Alliance• CA*net NOC• ACORN-NS
Research Rutherford Apparent
Networks
– network intelligence
• Uses light, non-instrusive, adaptive active probing
• ICMP or UDP packets in various configurations• Point-and-shoot to most IP addresses• Performs comprehensive network path
characterization• Performs expert system diagnostics• Single-ended two-way measures
(e.g. half-duplex different from full-duplex)• Samples network to generate same view as
best effort application (pre-TCP)
Research Rutherford Apparent
Networks
Abilene & CA*net Testing - 2003
GigE 2-way bandwidth vs. MTUfrom Kansas City to various universities
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 2000 4000 6000 8000 10000
MTU size (bytes)
2-w
ay
Ba
nd
wid
th (
Mb
ps
)
GigE 2-way bandwidth vs. MTUfrom Kansas City to various universities
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 2000 4000 6000 8000 10000
MTU size (bytes)
2-w
ay
Ba
nd
wid
th (
Mb
ps
)
512MTU
Standard 1500 M
TU
2048MTU
3072MTU
4096MTU
5120MTU
6144MTU
7168MTU
8192MTU
9000MTU
Research Rutherford Apparent
Networks
CA*net – 9k ORANs
Research Rutherford Apparent
Networks
CA*net4 Testing - 2004
CA*net4 - MTU PerformanceVancouver, Ottawa, Dalhousie, Edmonton
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 2000 4000 6000 8000 10000
MTU Size (bytes)
2-w
ay
Ba
nd
wid
th (
Mb
ps
)CA*net4 - MTU Performance
Vancouver, Ottawa, Dalhousie, Edmonton
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 2000 4000 6000 8000 10000
MTU Size (bytes)
2-w
ay
Ba
nd
wid
th (
Mb
ps
)
Standard 1500 M
TU
Research Rutherford Apparent
Networks
L2 Trends
• Cisco ONS 15454 up to 10000 MTU• CA*net4 L2 is implemented with ONS 15454
• Cisco Catalyst 6000/3750 up to 9216/9018 MTU• Foundry BigIron MG8 up to 9000 MTU
• “Jumbo frame support, up to 9 Kb, to expand data payload for network intense data transfer applications such as Storage Area Network (SAN) and Grid Computing.”
• Nortel Bay Stack 380 up to 9216 MTU• “Jumbo frame support of up to 9,216 bytes is provided on each
port for applications requiring large frames such as graphics and video applications.”
• Intel gigE and 10 x gigE NICs up to 16128 MTU• Syskonnect gigE NICs up to 9000 MTU
Research Rutherford Apparent
Networks
L3 Trends
• Cisco 12000/7300 up to 9180/9192 MTU
• Juniper M & T series up to 9192 MTU• Abilene backbone mainly Juniper T640• CA*net4 backbone are Juniper M20 or M40
• Extreme 10800 series up to 9126 MTU
• “Jumbo Frames – Studies show server CPU utilization is reduced by as much as 50% with the use of jumbo frames in clustering applications. Extreme Networks has optimized around support for a 9K jumbo frame that delivers the most optimized performance for cluster applications.”
Research Rutherford Apparent
Networks
Multiprocessor OS
NIC driver
kernel daemon
socket
applicationsocket buffer
driver buffer
kernel bufferper cpu
application buffer
programable filter
64 bit parallel data bus
~ 2000 megabytes/sec
sm fibres 10 km 1310 nm~ 1000 megabytes/sec per 10 gigE port
dual port 10 x gigabitEthernet NIC
9k
1.5k
1.5k 1.5k
9k9k
switchrouter
64 bit symetric multiprocessor
rx fi
fo b
uff
er
tx f
ifo
bu
ffe
r
VLAN
Research Rutherford Apparent
Networks
Scalability Issues
• current code approach scalable?• strategy for minimizing memory footprint and
processing overhead?
• implications for protocols?• more stack tuning? (e.g. variable packet
length?)• byte counters? (e.g. IPv6 has a 16 bit
counter)• inter packet gaps? (e.g. IEEE 802.3z burst
mode)
Research Rutherford Apparent
Networks
A Look Ahead
• Next-generation optical network-based virtual memory (VM)
• VM paging from disk scales with block transfer rate and mechanical seek latency
• VM paging from network scales with packet transfer rate and round trip time
• VM thrashing when OS is dominated by slow virtual memory swaps
Research Rutherford Apparent
Networks
Application Layer
• e2e application layer sensitivity look ahead
• Video or graphics (Nortel)• Throughput, CPU utilization, Jitter, Drops
• Storage Area Network and Grid (Foundry)• Throughput, CPU utilization
• Cluster applications (Extreme)• Throughput, CPU utilization
Research Rutherford Apparent
Networks
Initial CA*net4 Runs
• SDSC to Halifax
Research Rutherford Apparent
Networks
Initial CA*net4 Runs
• SDSC to CANARIE
Research Rutherford Apparent
Networks
Initial CUDI Runs
• SDSC to UNAM
Research Rutherford Apparent
Networks
Preventing MTU conflicts – Network Negotiation
9000 MTUServer
Server Client
1500 MTUMixed MTUNetwork
Network must beable to handle MTU
Negotiation
Research Rutherford Apparent
Networks
MTU handling via Fragmentation
9000 MTUServer
Server Client
1500 MTU
Router
req9000
Advantages:• commonly implemented
Disadvantages:• extreme load on router• some clients cannot reassemble packets
Applications:• ping• router advertisements
1500
Research Rutherford Apparent
Networks
MTU handling via RFC 1191 PMTU discovery
•Advantages:• Router is not loaded• Maximum performance achieved
•Disadvantages:• reliance on ICMP• easy to mis-configure
•Applications:• almost all modern applications
9000 MTUServer
Server Client
1500 MTU
Router
req9000
DF
ICMP1500DF
Research Rutherford Apparent
Networks
GigE Black Hole Hop
What is happening?:
• RFC 1191 and “TCP Slow Start” are interacting• Packets are lost• Retransmission happens, causing performance degradation• Client responds to some packets, keeping connection open• Overall performance appears slow to client
9000 MTUServer
Server Client
1500 MTU
Layer 2 switch
req9000
DF
4500DF
2250DF
1125DF resp
Research Rutherford Apparent
Networks
Avoiding GigE MTU problems
• Assign MTUs based on a per-subnet basis• Be consistent with MTU values used
• Use 1500 bytes for legacy Ethernet (no registry hacks)• We recommend 9000 bytes MTU for GigE when jumbo
frames are used (standard for Internet2 Abilene Network)• Remember to add 18 bytes when adjusting frame size (e.g. set
NIC to 9018 bytes frame size to maintain a 9000 byte MTU)• Remember not to arbitrarily filter out ICMP messages• Careful use of VLANs• Use of Layer 3 devices at MTU boundaries
• Maintain logical Layer 3 diagrams
Addr: 10.0.1.1-254Mask: 255.255.255.0Routes:Default GW: 10.0.1.1
Addr: 10.0.2.1-254Mask: 255.255.255.0Routes:Default GW: 10.0.2.1R
MTU: 9000
Research Rutherford Apparent
Networks
Path MTU Map Service
request
response
ma
p c
lient c
od
e
we
b in
terfa
ce
path MTU discovery
route parameteranalysis
analysis ?
path MTUroute
route analysis
ma
p s
ervic
e c
od
e
path MTU route(s)request
archive
request
• Researcher tool to troubleshoot and help optimize path MTU
Research Rutherford Apparent
Networks
Resources
Some Path MTU tools:
• ANA pMTU service – from ANA sequencers across I2/CA*nethttp://pathmtu.apparenet.com:8282/[email protected]:guest42
• NCNE MTU Discovery Service – uses service located at NCNE http://www.ncne.org/jumbogram/mtu_discovery.php
• pMTU Applet - Java-based client for end-user station http://sourceforge.net/projects/pmtu/
Jumbo MTU Performance whitepaper• http://www.apparentNetworks.com/wp/
Research Rutherford Apparent
Networks
Demo: pMTU Client
Demo pMTU applet
Research Rutherford Apparent
Networks
End of Presentation
Research Rutherford Apparent
Networks
GigE Black Hole Hop
resp
2250DF
1125DF
Server
req
Server
ClientLayer 2 switch
Research Rutherford Apparent
Networks
MTU handling via fragmentationServer
req
Server
ClientRouter
Research Rutherford Apparent
Networks
MTU handling via RFC 1191 PMTU discovery
Server
req
Server
ClientRouter
ICMP
1500DF
1500DF
1500DF