Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from...

Post on 28-Dec-2015

217 views 0 download

Tags:

transcript

1

Network Stack Spe-cialization

for PerformancePresented by Donghwi Kim

(Some figures are brought from the paper)

2

Objective

• The authors tried to show upper bound of network application performance by specialization(Actually, not only a network stack but also an ap-plication’s implementation is specialized)

• A special kind of applications is chosen(Serves same content to multiple users)• Sandstorm: A Web server serves static webpage• Namestorm: A DNS server

3

Key of performance

• A complete zero-copy stack• Aggressive amortization• Pre-packetized data• Batching to mitigate system-call overhead

• Synchronous, clocked from received packets• Improves cache locality• Minimize the latency of sending the first packet of re-

sponse

• Intel’s DDIO

4

Network stack

• libnmio: Data-movement and event-notification primitives• libeth: A lightweight Eth-

ernet-layer• libtcpip: An optimized

TCP/IP layer• libudpip: A UDP/IP layer

5

A complete zero-copy stack• Receiving a packet• Done by DMA

• Transmitting a packet• Aggressive amortization

• Modify one of prepared a copy of packet and use DMA• The modifications are performed in a single pass to use

CPU’s L1 cache efficiently

6

A complete zero-copy stack• pre-copy method• maintain more than one copy of each packet• potential to thrash CPU’s L3 cache

• memcpy method• maintain one long-term copy and create ephemeral

copies• more work should be done

7

How the optimization works?

• Batching increases TCP RTT• Amortizing reduces per-request processing

8

Intel’s DDIO

• Direct Data I/O

• When transmission• Pull data from the L3 cache without a detour through

system memory

• When reception• DMA can place data in processor’s L3 cache

9

Evaluation

10

Evaluation

11

Evaluation

12

DDIO

• Pre-copy case: DDIO pulls untouched incoming data into the cache, so the file data cannot be cached• Memcopy case: CPU loads file data into the cache

13

Discussion

• mTCP vs. Sandstorm

14

Discussion

• mTCP• Provides UNIX-like socket programming interface• mTCP provides fairness

• TCP of Sandstorm• Higher level stack does not wrap lower level stack

• Each stack is a stand-alone service• For example, an application interacts directly with libnmio

• Amortization, no-queueing, inaccurate timer cannot guarantee correctness• Limited applications