+ All Categories
Home > Documents > Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the...

Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the...

Date post: 31-Jan-2018
Category:
Upload: phungkhuong
View: 220 times
Download: 0 times
Share this document with a friend
22
Eidgenössische Technische Hochschule Zürich Ecole polytechnique fédérale de Zurich Politecnico federale di Zurigo Swiss Federal Institute of Technology Zurich Ninth IEEE International Symposium on High Performance Distributed Computing, Pittsburgh, Pennsylvania, August 1-4, 2000 Speculative Defragmentation Speculative Defragmentation A Technique to Improve the Communication A Technique to Improve the Communication Software Efficiency for Gigabit Ethernet Software Efficiency for Gigabit Ethernet Ch. Kurmann, F. Rauch, M. Müller, T. Stricker Laboratory for Computer Systems ETHZ - Swiss Institute of Technology CH-8092 Zurich
Transcript
Page 1: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

EidgenössischeTechnische Hochschule

Zürich

Ecole polytechnique fédérale de ZurichPolitecnico federale di Zurigo

Swiss Federal Institute of Technology Zurich

Ninth IEEE International Symposium on High PerformanceDistributed Computing, Pittsburgh, Pennsylvania, August 1-4, 2000

Speculative DefragmentationSpeculative Defragmentation––A Technique to Improve the CommunicationA Technique to Improve the Communication

Software Efficiency for Gigabit EthernetSoftware Efficiency for Gigabit Ethernet

Ch. Kurmann, F. Rauch, M. Müller, T. StrickerLaboratory for Computer Systems

ETHZ - Swiss Institute of TechnologyCH-8092 Zurich

Page 2: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

2

Comm. Speeds of Commodity PCsComm. Speeds of Commodity PCs

ÿ For Gigabit Ethernet and TCP/IP the OS-softwarecannot keep up with the hardware speed

MPI-Linux 2.0-BIP

MPI-Linux 2.2

TCP-Linux 2.2

TCP-Windows NT

0 20 40 60 80 100 120 140Transfer-rate [MByte/s]

Gigabit Ethernet 32bit-PCI

2020

3535

Myrinet 32bit-PCI

4242

126126

Page 3: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

3

OverviewOverview

• Why Gigabit Ethernet

• Packet Defragmentation

• TCP/IP Overheads

• Speculative Packet Defragmentation

• Performance Analysis

• Conclusion

Page 4: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

4

Problem StatementProblem Statement

How can we sustain network bandwidths of75-100 MByte/s with a commodity PC cluster node:

• memory copy 90 MByte/s• 32-bit PCI I/O-bus 132 MByte/s• commodity Gigabit Ethernet adapter 100 MByte/s

• standard TCP/IP protocol• fully transparent standard socket-API

Page 5: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

5

Papers 10 Years AgoPapers 10 Years AgoThe same problem — 30-100 times slower

• memory copy < 3 MByte/s• VME I/O-bus < 3 MByte/s• commodity 10BaseT Ethernet adapter 1 MByte/s

• special purpose blast transfer protocol [Zwaenepoel85]

• optimistic bulk transfers [Carter89]

• transparent blasts by header padding [Peterson90]

Not standard protocol & not fully transparent

ÿ Solutions did not find their way into current systems!

Page 6: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

6

Why Gigabit EthernetWhy Gigabit Ethernet

• Compatible to Ethernet and Fast Ethernet (UTP Cat5)• Uncomplicated technology which results in high reliability

and low cost• Switched Ethernet provides link level flow control on full

duplex channels

• In larger networks only unacknowledged, connectionlessdatagram delivery service ÿ TCP needed

• Standard frame size is still limited to 46-1500 Byte ofdata

Page 7: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

7

Alternatives / ExtensionsAlternatives / Extensions

• Dedicated network hardware with customized lightweightprotocols: Myrinet, SCI, Giganet, ServerNet

ÿ primarily designed for internal communicationin server farms

• Jumbo Frames (9 KByte) for Gigabit Ethernet to reach aMaximal Transfer Unit (MTU) of a memory page:

ÿ • change of standard• higher latencies in store and forward switches• do not solve the header/payload separation

Page 8: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

8

Packet [De]FragmentationPacket [De]Fragmentation

• IP standard technique• Data to be sent is fragmented into small

chunks < network MTU (Maximal Transfer Unit)• Network protocols enclose the frames with header/trailer• Receiver separates the headers from the payload and

defragments the data again

• Implications for Ethernet:• MTU < Memory Page• DMA-logic not optimal

ÿ Therefore memory copy for packet [de]fragmentation

Page 9: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

9

TCP/IP Host OverheadsTCP/IP Host Overheads

• Single largest overhead:copying and checksums

ÿ Zero-copytechniques

• Per-packet processingand interrupt overheadalso high

ÿ Interrupt coalescing 0

20

40

60

80

100

Per

cent

CP

U

Copy &ChecksumInterrupt

TCP/IP

Driver/ DMAInit

PII 400MHz, Linux 2.2

Host Overhead for TCP/IPover Gigabit Ethernet

Page 10: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

10

OS EnvironmentOS Environment

TCP/IP Stack

NIC Driver

Socket Layer

User Application

NIC Firmware

Userspace

Kernelspace

NIC

PCI Bus

Control Path

Middleware (CORBA, MPI)

Previouswork

SpeculativeDefragmentation

Protectionboundary

copies

Drivercopies

Data Path

Send and Receive Buffers

System Page PoolProtocol handling, Packet

Generation

User Mapped Data Pages

.

.

.

.

.

.

.

.

.

DMA

ORB Marshalling, Buffering

Page 11: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

11

Required TechnologiesRequired Technologies

• Well known solutions to eliminate the User/Kernel copy:• User-Level Network Interface (U-Net) or Virtual

Interface Architecture (VIA)• User/Kernel Shared Memory (FBufs, IOLite), Copy

Emulation or Page Remapping with Copy on Write

• The Driver copy remains for Gigabit Ethernet

ÿ Goal: Elimination of driver copy for the packetdefragmentation and header separation

ÿ True zero-copy

Page 12: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

12

Commodity GE-AdaptersCommodity GE-Adapters

• Until now, zero-copy support is only available for“intelligent” network adapters (ATM, SiliconTCP)

• Today’s Gigabit Ethernet adapters are too simple• no processor, TLBs on board• limited DMA capabilities• no protocol filtering implemented

ÿ Deterministic zero-copy implementation withcommodity GE adapters is not possible!

• Approach: Making just the common case fast

ÿ Speculation Techniques for Defragmentation

Page 13: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

13

4096

Speculative Defragmentation ISpeculative Defragmentation I

• Our driver manages to send/receive entire 4 KByte pages• Decomposition of 4 KByte IP-packets into 3 IP-fragments

on driver level (standard IP fragmentation)• Attachment of

headers to thepayload data witha separateDMA-descriptor

data

zcdata

data

zcdata

data

zcdata

status length

status length

status length

status length

status length

status length

ET

HE

TH

IPIP TC

PT

CP

ET

HE

TH

14,20

14,20

IPIPE

TH

ET

HIPIP

14,20,20 1460 1480 1156

1st Frag. 2nd Frag. 3rd Frag.

Page 14: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

14

Speculative Defragmentation IISpeculative Defragmentation II

What are we speculating about?• Speculation that all fragments of a whole page will be

received in order• Speculation about the precise packet format (header-

lengths, data-fields)

• The receiver has to fix the DMA descriptors withoutknowledge about the next packets to arrive

• In clusters with one or two switches, the probability ishigh, that the three fragments arrive in order

• Software cleanup when mis-speculation

Page 15: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

15

Speculative Defragmentation IIISpeculative Defragmentation IIIFragmentation/Defragmentation of a 4 KByte

memory page by the DMA of the network interface

Ethernet Network

zcdata

header

... ...zcdata

header

sk_buffsk_buff

Protocol Headers

4 KByte Page

1460 2nd

14801156

3rd

Fragmentation

1st

1460

2nd

148011561st

Defragmentation

3rd

Page 16: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

16

Performance EvaluationPerformance Evaluation

• Gains by Successful Speculation• Penalty for Speculation Misses

• Speculation Success Rates in Applications

• Consequences:- Network Control Architecture

- Suggested Hardware Improvements

Page 17: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

17

Gains with SpeculationGains with Speculation

ÿ 80 % increase in performance (bandwidth)

Spec. Defragmentationwith ZeroCopy FBufs

Spec. Defragmentationwith ZeroCopy Remapping

Speculative Defrag.with Copying Socket API

Linux 2.2 Standard

0 10 20 30 40 50 60 70 80Transfer-rate [MByte/s]

TCP/IP Performance of Gigabit Ethernet

464646

454545

ZeroCopy Remappingwith Copying Driver

1 copy

0 copy

757575

424242

656565

Page 18: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

18

Penalty with Speculation MissesPenalty with Speculation Misses

ÿ The common case is fast, the fallback not much slower

0 10 20 30 40 50 60 70 80Transfer-rate [MByte/s]

CompatibilityZero-Copy SenderStandard Receiver

Linux 2.2 OperationStandard SenderStandard Receiver

TCP/IP Performance of Gigabit Ethernet

454545

424242

353535FallbackStandard SenderZero-Copy Receiver

Page 19: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

19

• Application traces show success of speculative transfers

• TreadMarks has an inherent scheduling that preventsinterference

• TPC-D needs a control architecture or hardware changes

Evaluation of Success RatesEvaluation of Success Rates

totallargezcopyok

100 %

68182440104400444004

Master

TreadMarks SOR

> 99 %> 99 %100 %100 %48 %Success Rate

50731304193040530399

51095307073069330675

62311448484168241682

67524458773783337833

129835907257951538235

Ethernetframes

Host2Host1Host2Host1Master

Oracle TPC-DTraces

Page 20: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

20

Network Control ArchitectureNetwork Control Architecture

• Problem: Multiple synchronous, fast receives maygarble the zero-copy frames

�• Solution: Admission Control on Ethernet driver level

with negotiation for one single sender to blast

• Implicit channel allocation by OS works• Fully transparent• No explicit scheduling of transfers through a special

interface ÿ the API remains the same

Page 21: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

21

Suggested Hardware ImprovementsSuggested Hardware Improvements

• Additional control-path between the checksumming- andthe DMA-logic for detection of protocol & header fields

ÿ Reliable header/payload separation

• Stream detection with a simple matching register and aseparate DMA descriptor chain for fast transfers:

ÿ Detection of at least one high performance streamÿ Separation of this stream with its DMA descriptors

ÿ Improvement of the speculation rateLower driver complexity

Page 22: Speculative Defragmentation - cs.inf.ethz.ch · PDF fileÿ The common case is fast, the fallback not much slower 0 1020 304050 607080 Transfer-rate [MByte/s] Compatibility Zero-Copy

22

ConclusionConclusion

• Speculation techniques open a new horizon foroptimized network drivers and permit an “almost”-zero-copy implementation for TCP/IP over Gigabit Ethernet.

• The performance in our implementation was raisedfrom 42 to 75 MByte/s (80%) using the standard LinuxTCP-stack and commodity network interface hardware.

• Speculation works in network interfaces as well as in“Instruction Level Parallelism” and should beconsidered to find simple and effective hardwareimprovements for network interfaces.

• Existing Ethernet protocols and standard networkinterface chipsets prevent an accurate, fullydeterministic defragmentation in hardware.


Recommended