TCP Servers - UCRbhuyan/CS260/LECTURE8.pdf · Introduction The network subsystem is nowadays one of...

TCP Servers:Offloading TCP Processing in Internet Servers.

Design, Implementation, and PerformanceM. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel.

Presented by:

Thomas Repantis

[email protected]

CS260-Seminar in Computer Science, Fall 2004 – p.1/35

Overview

To execute the TCP/IP processing on a dedicatedprocessor, node, or device (the TCP server) usinglow-overhead, non-intrusive communication between itand the host(s) running the server application.Three TCP Server architectures:

1. A dedicated network processor on a symmetricmultiprocessor (SMP) server.

2. A dedicated node on a cluster-based server builtaround a memory-mapped communicationinterconnect such as VIA.

3. An intelligent network interface in a cluster ofintelligent devices with a switch-based I/Ointerconnect such as Infiniband. CS260-Seminar in Computer Science, Fall 2004 – p.2/35

Introduction

• The network subsystem is nowadays one of themajor performance bottlenecks in web servers:Every outgoing data byte has to go through thesame processing path in the protocol stack downto the network device.

• Proposed solution a TCP Server architecture:Decoupling the TCP/IP protocol stack processingfrom the server host, and executing it on adedicated processor/node.


Introductory Details

• The communication between the server host andthe TCP server can dramatically benefit from usinglow-overhead, non-intrusive, memory-mappedcommunication.

• The network programming interface provided tothe server application must use and tolerateasynchronous socket communication to avoid datacopying.


Apache Execution Time Breakdown


Motivation

• The web server spends in user space only 20% ofits execution time.

• Network processing, which includes TCPsend/receive, interrupt processing, bottom halfprocessing, and IP send/receive take about 71%of the total execution time.

• Processor cycles devoted to TCP processing,cache and TLB pollution (OS intrusion on theapplication execution).


TCP Server Architecture

• The application host avoids TCP processing bytunneling the socket I/O calls to the TCP serverusing fast communication channels.

• Shared memory and memory-mappedcommunication for tunneling.


Advantages

• Kernel Bypassing.• Asynchronous Socket Calls.• No Interrupts.• No Data Copying.• Process Ahead.• Direct Communication with File Server.


Kernel Bypassing

• Bypassing the host OS kernel.• Establishing a socket channel between the

application and the TCP server for each opensocket.

• The socket channel is created by the host OSkernel during the socket call.


Asynchronous Socket Calls

• Maximum overlapping between the TCPprocessing of the socket call and the applicationexecution.

• Avoid context switches whenever this is possible.


No Interrupts

• Since the TCP server exclusively executes TCPprocessing, interrupts can be apparently easilyand beneficially replaced with polling.

• Too high polling frequency rate would lead to buscongestion while too low would result in inability tohandle all events.


No Data Copying

• With asynchronous system calls, the TCP servercan avoid the double copying performed in thetraditional TCP kernel implementation of the sendoperation.

• The application must tolerate the wait forcompletion of the send.

• For retransmission, the TCP server can read thedata again from the application send buffer.


Process Ahead

• The TCP server can execute certain operationsahead of time, before they are actually requestedby the host.

• Specifically, the accept and receive system calls.


Direct Communication with FileServer

• In a multi-tier architecture a TCP server can beinstructed to perform direct communication withthe file server.


TCP Server in an SMP-basedArchitecture

• Dedicating a subset of the processors for in-kernelTCP processing.

• Network generated interrupts are routed to thededicated processors.

• The communication between the application andthe TCP server is through queues in sharedmemory. CS260-Seminar in Computer Science, Fall 2004 – p.15/35

SMP-based Architecture Details

• Offloading interrupts and receive processing.• Offloading TCP send processing.


TCP Server in a Cluster-basedArchitecture

• Dedicating a subset of nodes to TCP processing.• VIA-based SAN interconnect.


Cluster-based Architecture Operation

• The TCP server node acts as the networkendpoint for the outside world.

• The network data is transferred between the hostnode and the TCP server node across SAN usinglow latency memorymapped communication.


Cluster-based Architecture Details

• The socket call interface is implemented as a userlevel communication library.

• With this library a socket call is tunneled acrossSAN to the TCP server.

• Several implementations:1. Split-TCP (synchronous)2. AsyncSend3. Eager Receive4. Eager Accept5. Setup With Accept


TCP Server in anIntelligent-NIC-based Architecture

• Cluster of intelligent devices over aswitched-based I/O (Infiniband).

• The devices are considered to be "intelligent", i.e.,each device has a programmable processor andlocal memory.


Intelligent-NIC-based ArchitectureDetails

• Each open connection is associated with amemory-mapped channel between the host andthe I-NIC.

• During a message send, the message istransferred directly from user-space to a sendbuffer at the interface.

• A message receive is first buffered at the networkinterface and then copied directly to user-space atthe host.


4-way SMP-based Evaluation

• Dedicating two processors to network processingis always better than dedicating only one.

• Throughput benefits of up to 25-30%.CS260-Seminar in Computer Science, Fall 2004 – p.22/35




• When only one processor is dedicated to thenetwork processing, the network processorbecomes a bottleneck and, consequently, theapplication processor suffers from idle time.

• When we apply two processors to the handling ofthe network overhead, there is enough networkprocessing capacity and the application processorbecomes the bottleneck.

• The best system would be one in which thedivision of labor between the network andapplication processors is more flexible, allowing forsome measure of load balancing.


2-node Cluster-based Evaluation forStatic Load

• Asynchronous send operations outperform theircounterparts



• Smaller gain than that achievable with SMP-basedarchitecture.

• 17% is the greatest throughput improvement wecan achieve with this architecture/workloadcombination.



• In the case of Split-TCP and AsyncSend the hosthas idle time available since it is the networkprocessing at the TCP server that proves to be thebottleneck.


2-node Cluster-based Evaluation forStatic and Dynamic Load

• Split TCP and Async Send systems saturate laterthan Regular TCP.



• At an offered load of about 500 reqs/sec, the hostCPU is effectively saturated.

• 18% is the greatest throughput improvement wecan achieve with this architecture.



• Balanced confgurations depend heavily on theparticular characteristics of the workload.

• A dynamic load balancing scheme between hostand TCP server nodes is required for idealperformance in dynamic workloads


Intelligent-NIC-based SimulationEvaluation

• For all the simulated processor speeds, theSplit-TCP system outperforms all the otherimplementations.

• The improvements over a conventional systemrange from 20% to 45%.


Intelligent-NIC-based SimulationEvaluation

• The ratio of processing power at the host to thatavailable at the NIC plays an important role indetermining the server performance.

• In Split-TCP the processor on the NIC saturatesmuch earlier than the host processor or thenetwork.

• We can achieve better performance with aSplit-TCP implementation only with a fastprocessor on the NIC.


Conclusions about TCP Servers 1/2

• Offloading TCP/IP processing is beneficial tooverall system performance when the server isoverloaded.

• An SMP-based approach to TCP servers is moreefficient than a cluster-based one.

• The benefits of SMP and cluster-based TCPservers reach 30% in the scenarios we studied.

• The simulated results show greater gains of up to45% for a cluster of devices.


Conclusions about TCP Servers 2/2

• TCP servers require substantial computingresources for complete offloading.

• The type of workload plays a significant role in theefficiency of TCP servers.

• Depending on the application workload, either thehost processor or the TCP Server can be- comethe bottleneck.

• Hence, a scheme to balance the load between thehost and the TCP Server would be beneficial forserver performance.


Thank you!

Questions/comments?


Date post:	23-Apr-2018
Category:	Documents
Upload:	hoangdien
View:	214 times
Download:	0 times

TCP Servers - UCRbhuyan/CS260/LECTURE8.pdf · Introduction The network subsystem is nowadays one of...

Documents