1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State...

Post on 18-Dec-2015

217 views 0 download

transcript

1

Web Server Performance in aWAN Environment

Vincent W. FreehComputer Science

North Carolina State

Vsevolod V. PanteleenkoComputer Science & Engineering

University of Notre Dame

2

Large web site

Complex design and interaction

Multiple tiers Appliance Web, app, & DB servers

Study performance of web server Cached pages

Most testing Simulated load LAN environment

Our evaluation adds Simulated WAN environment Small MTU, BW limits, latency Shows some optimization

aren’t

Appliance Web Servers Application Servers

Database Servers

Clients

3

Evaluating a web server

Three parts Measuring the server Loading the server Supporting the server

Appliance Web Servers Application Servers

Database Servers

Clients

Net

Server load

Server demand

Tiers2&3

4

Two ways to load server

Synthetic load Controlled Reproducible Flexible Only as good as assumptions, mechanisms Hard to replicate real world

Real-world load Uncontrolled Not reproducible (can use traces) Accurate model of system Hard to produce extreme or rare conditions

Discussion Need both Validate simulations with real-world tests

Net

5

Loading the server

Our tests use synthetic load Three load-generating tools Micro-benchmarking tool

Requests a single object at a constant rate Tests delivery of static, cached documents Establishes base line

Net

6

Modified SURGE

SURGE Scalable URL reference generator Barford & Crovella, U Boston Emulates statistical distribution

Object & request sizeObject popularityEmbedded object referencesTemporal localityUse idle periods

Modifications Converted from process based to event based

To increase number of clients Server-throttling problem eliminated

Net

7

Delays and limits

Emulate WAN parameters in a LAN Network delays Bandwidth limits

Modified kernel and protocol stack Separate delay queue per TCP connection Necessary for accurate emulation More accurate than Dummynet & NISTnet (per interface)

Net

8

Measuring a web server

OS

Network

HTTP

request reply

drivers

TCP/IP

Apache, TUX

9

Measuring a web server

OS

Network

HTTP

request reply

Measure utilization usingHW performance counters

10

Test environment

OS: Linux 2.4.8 Node: (server & clients)

Pentium III, 650MHz 512MB main memory

NIC: 3COM 3C590 100 Mbps ethernet Direct connect

Software: Client: microbenchmarking, SURGE, delay/limits Server: Apache, Tux

Warmed client No cache misses

Client

Client

Server

NIC

NIC

NIC

NIC

11

Cost breakdown – file size, Apache

Majority of time in interrupt (recv’g)But most data is sent.

•MTU = 536 bytes•Delay = 200 ms•BW = 56 Kbps•Data send rate = 3MB/s

12

Cost breakdown - file size, TUX

•Twice data send rate as Apache.•Essentially all cost in interrupts.

•MTU = 536 bytes•Delay = 200 ms•BW = 56 Kbps•Data send rate = 6 MB/s

13

Apache versus TUX

Apache TUX

Server send rate 3.0 MB/s 6.0 MB/s

Packets rec’d / s 5738 11,991

Packets sent / s 6156 11,878

Interrupts / s 7482 13,974

Concurrent connections 784 1451

14

Cost breakdown vs. MTU

Surge parameters•Size = 10 KB•Delay = 200 ms•BW = 56 Kbps•Data send rate = 6 MB/s

15

Effects of network delay

Surge parameters•MTU = 536 bytes•Size = 10 KB•BW = 56 Kbps•Data send rate = 6 MB/s

16

Effects of bandwidth limits

Surge parameters•MTU = 536 bytes•Size = 10 KB•Delay = 200 ms•Data send rate = 6 MB/s

20% decrease in overhead from 28kbps to infinity

17

Persistent connections

Surge parameters•MTU = 536 bytes•Size = 10 KB•Delay = 200 ms•Size = 10 KB•Data send rate = 6 MB/s

10% decrease going from 1 to 16 requests per connection

18

Copy and checksumming

0

0.1

0.2

0.3

0.4

0.5

0.6

zero copy & HW

checksumming

copy & checksum

kernel mode

socket write

soft intr.

hard intr.

Surge parameters•MTU = 536 bytes•Size = 10 KB•Delay = 200 ms•Size = 10 KB•Data send rate = 6 MB/s

19

Re-assess value of some optimizations

Copy & checksumming avoidance LAN: 25-111% copy or 21-33% copy & 10-15% checksum WAN: 10% combined

Select optimization LAN: 28% WAN: < 10%

Connection open/close avoidance (HTTP 1.1) LAN: “greatly”, “significantly” WAN: < 10%

20

Conclusion

Most processing in protocol stack and drivers Small MTU size increases processing cost Little effect from

Network delay Bandwidth limitations Persistent connections

End-user request latency depends Primarily on connection bandwidth Secondarily on network delay

Future Dynamic & uncached pages Add packet loss

Work supported by IBM UPP & NSF CCR9876073

www.csc.ncsu.edu/faculty/freeh/

21

End

22

Persistent connections - packets/s

23

Number of Packets vs. MTU

24

Web (HTTP) servers

Apache Largest install base User space Process-based model

TUX Niche server Kernel space Event-based model Aggressive

optimizations Copy/checksum

avoidance Object, name caching

25

Measuring a web server

OS

Network

HTTP

request reply

26

Interrupt coalescing

0

0.1

0.2

0.3

0.4

0.5

0.6

Apache:normal

Apache: intr.coalescing

TUX: normal TUX: intr.coalescing

CP

U u

tiliz

atio

n

user modekernel mode

socket w ritesoft. intr.hard. intr.

Decreases interrupt scheduling overhead Interrupt every 2 ms