+ All Categories
Home > Documents > Effects of Interrupt Coalescence on Network Measurements

Effects of Interrupt Coalescence on Network Measurements

Date post: 25-Apr-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
10
Effects of Interrupt Coalescence on Network Measurements ? Ravi Prasad, Manish Jain, and Constantinos Dovrolis College of Computing, Georgia Tech., USA ravi,jain,[email protected] Abstract. Several high-bandwidth network interfaces use Interrupt Co- alescence (IC), i.e., they generate a single interrupt for multiple packets received in a short time period. IC decreases the per-packet interrupt processing overhead at the CPU. However, IC also introduces queue- ing delays and alters the “dispersion” (i.e., interarrival time spacing) of packet pairs or trains. In this work, we first explain how IC works in two popular Gigabit Ethernet controllers, and identify the potential negative effects of IC on active and passive network measurements. Specifically, we show that IC can affect active bandwidth estimation techniques, caus- ing erroneous measurements. It can also alter the packet interarrivals in passive monitors that use commodity network interfaces. Then, we describe the “signature” of IC in the dispersion and one-way delays of packet trains. We show that this signature can be detected and removed from the raw measurements, enabling accurate bandwidth estimation. We finally show that IC can also be detrimental to TCP self-clocking, causing bursty delivery of ACKs and subsequent bursty transmission of data segments. 1 Introduction The arrival and departure of packets at a Network Interface Card (NIC) are two events that the CPU learns about typically through interrupts. An interrupt- driven kernel, however, can get in a receive livelock state in which the CPU spends all its time processing network interrupts without having available cycles to process the data contained in the received packets [1]. In Gigabit Ethernet (GigE) paths, 1500-byte packets can arrive to a host in every 12μs. If the interrupt processing overhead is longer than 12μs, receive livelock can occur if an interrupt is generated for every arriving packet. ? This work was supported by the “Scientific Discovery through Advanced Com- puting” program of the US Department of Energy (award number: DE-FG02- 02ER25517), by the “Strategic Technologies for the Internet” program of the US National Science Foundation (award number: 0230841), and by an equipment do- nation from Intel. Any opinions, findings, and conclusions or recommendations ex- pressed in this material are those of the authors and do not necessarily reflect the views of the previous funding sources.
Transcript

Effects of Interrupt Coalescence on Network

Measurements ?

Ravi Prasad, Manish Jain, and Constantinos Dovrolis

College of Computing, Georgia Tech., USAravi,jain,[email protected]

Abstract. Several high-bandwidth network interfaces use Interrupt Co-alescence (IC), i.e., they generate a single interrupt for multiple packetsreceived in a short time period. IC decreases the per-packet interruptprocessing overhead at the CPU. However, IC also introduces queue-ing delays and alters the “dispersion” (i.e., interarrival time spacing) ofpacket pairs or trains. In this work, we first explain how IC works in twopopular Gigabit Ethernet controllers, and identify the potential negativeeffects of IC on active and passive network measurements. Specifically, weshow that IC can affect active bandwidth estimation techniques, caus-ing erroneous measurements. It can also alter the packet interarrivalsin passive monitors that use commodity network interfaces. Then, wedescribe the “signature” of IC in the dispersion and one-way delays ofpacket trains. We show that this signature can be detected and removedfrom the raw measurements, enabling accurate bandwidth estimation.We finally show that IC can also be detrimental to TCP self-clocking,causing bursty delivery of ACKs and subsequent bursty transmission ofdata segments.

1 Introduction

The arrival and departure of packets at a Network Interface Card (NIC) are twoevents that the CPU learns about typically through interrupts. An interrupt-driven kernel, however, can get in a receive livelock state in which the CPU spendsall its time processing network interrupts without having available cycles toprocess the data contained in the received packets [1]. In Gigabit Ethernet (GigE)paths, 1500-byte packets can arrive to a host in every 12µs. If the interruptprocessing overhead is longer than 12µs, receive livelock can occur if an interruptis generated for every arriving packet.

? This work was supported by the “Scientific Discovery through Advanced Com-puting” program of the US Department of Energy (award number: DE-FG02-02ER25517), by the “Strategic Technologies for the Internet” program of the USNational Science Foundation (award number: 0230841), and by an equipment do-nation from Intel. Any opinions, findings, and conclusions or recommendations ex-pressed in this material are those of the authors and do not necessarily reflect theviews of the previous funding sources.

The CPU overhead for each arriving packet consists of a context switch,followed by the execution of the network interrupt service routine, followed byanother context switch. In order to reduce the per-packet CPU overhead, and toavoid receive livelock, most high-bandwidth network interfaces today (in partic-ular GigE cards) use Interrupt Coalescence (IC). IC is a technique in which NICsdelay interrupt generation so that a single interrupt can serve several packetsreceived in a short time interval. Similarly, at the sending host, the CPU learnsabout the departure of several packets through a single interrupt. IC is not anew technique; it has been used repeatedly in the past whenever a new LANtechnology brings the network speed closer to the CPU speed [1].

In this work, we identify and describe the negative effects of IC on both activeand passive network measurements. Specifically, IC can affect active bandwidthestimation techniques, if the latter ignore the possibility of IC at the receiver’sNIC. Bandwidth estimation techniques are classified as either capacity measure-ments or available bandwidth measurements. For a recent survey, we refer thereader to [2]. Capacity estimation is based on the analysis of dispersion measure-ments from back-to-back packet pairs and packet trains. Available bandwidthestimation, on the other hand, is based on the relation between the departureand arrival rates of periodic packet streams.

We have modified our previous capacity and available bandwidth measure-ment techniques, presented in [3] and [4] respectively, so that they can provideaccurate estimates even in the presence of IC. For capacity estimation, we adjustthe length of the probing packet trains so that we can detect the IC signaturein the receiver dispersion measurements. Then, we can filter our the erroneousmeasurements and estimate the correct path capacity. Similarly, for availablebandwidth estimation, we present an algorithm that can detect the IC signa-ture in one-way delay measurements of periodic packet streams. The algorithmignores the one-way delay measurements that have been affected by IC, and soit can accurately estimate the arrival rate of a periodic packet stream.

IC can also affect passive measurements that use commodity NICs for packettrace capture. The reason is that IC can alter the packet interarrivals in themonitored traffic stream. We also show that IC can be detrimental to TCP self-clocking, causing bursty delivery of ACKs to the sender and bursty transmissionof data segments. We finally argue that the benefits of IC may not be impor-tant in practice if the ratio of the CPU speed to the network interface speed issufficiently large.

The rest of the paper is structured as follows. §2 explains how IC works in twopopular GigE controllers. § 3 identifies and describes the negative effects of IC onboth active and passive network measurements, and it describes methodologiesthat can be used to estimate the capacity and available bandwidth of the patheven in the presence of IC. § 4 concludes the paper.

2 Description of Interrupt Coalescence

The basic goal of IC is to reduce the total number of network interrupts generatedper unit of time. This can be achieved by either delaying an interrupt for sometime, expecting that more packets will be received in the meanwhile, or until acertain occupancy level is reached at the NIC buffers. Most GigE NICs todaysupport IC by buffering packets for a variable time period (dynamic), or for afixed time period (static) [5, 6]. In dynamic mode, the NIC controller determinesan appropriate interrupt generation rate based on the network/system load.In static mode, the interrupt generation rate or latency is set to a fixed valuespecified in the driver.

Users (with root privileges) can modify the default controller behavior byusing the parameters provided by NIC drivers. For instance, the driver for IntelGigE E1000 controller [7] provides the following parameters for adjusting the ICreceive (Rx) and transmit (Tx) operations:

– InterruptThrottleRate: sets the limit of the maximum number of interruptsgenerated per second. This is performed by limiting the minimum time in-terval between consecutive interrupts.

– (Rx,Tx)AbsIntDelay: delays of the generation of interrupt for the first packetarrived after the last interrupt. This represents the maximum duration forwhich a packet can be buffered in the NIC.

– (Rx,Tx)IntDelay: sets the delay in generation of the next interrupt sincethe last packet arrival. This parameter represents the minimum duration forwhich a packet can be buffered in the NIC.

Note that (Rx,Tx)AbsIntDelay, (Rx,Tx)IntDelay and InterruptThrottleRateare used in combination and and in case of conflict, InterruptThrottleRate takesprecedence.

Packet arrivalTime

RxAbsIntDelay

Interrupt generated

Timer expire

RxIntDelay

Case I

Case II

Fig. 1. Illustration of IC with RxAbsIntDelay (Case I) and with RxIntDelay (Case II).

Figure 1 illustrates how the interrupts would be generated if the receivinghost’s NIC only employed one of the two parameters, RxIntDelay or RxAbsInt-Delay. In case I, RxAbsIntDelay is used and interrupt timer is set to expire afterthe fixed duration on the arrival of first packet. All the subsequent packets ar-rivals before the timer expires are buffered in NIC and are delivered on nextinterrupt. If the minimum packets spacing is T , then the maximum number ofpackets that will be buffered is RxAbsIntDelay/T . In case II, every packet arrivalresets the previous interrupt timer. Consequently, if the packets arrive periodi-cally with spacing less than RxIntDelay, the interrupt generation can be delayedindefinitely exhausting the resources of NIC. This scenario is illustrated in caseII when 9 packets are delivered by the second interrupt.

Similarly, the SysKonnect GigE driver [8] provides two parameters: moderateand intpersec. Moderate determines whether IC is enabled and whether it is instatic or dynamic mode. Intpersec determines the minimum latency of interruptgeneration in the static mode. This parameter is similar to the InterruptThrot-tleRate for Intel NICs.

3 Effects of Interrupt Coalescence

In the following, we describe how IC affects most active and passive measurementtools as well as TCP’s self-clocking behavior. We also describe the signature ofIC that can be used to detect the presence of IC. Finally we present techniquesto estimate end-to-end capacity and available bandwidth in the presence of IC.

3.1 Capacity Estimation.

Several active bandwidth estimation tools use the measured dispersion of packetpairs at the receiver to estimate the capacity of a network path [2]. The basicidea is that the dispersion of a packet pair is inversely proportional to the pathcapacity, in the absence of cross traffic. Note that the dispersion of packet pairsis typically measured at the application or kernel, and not at the NIC. If thereceiver’s NIC performs IC, however, the packet pair will be delivered to the ker-nel and then to the application with a single interrupt, destroying the dispersionthat the packet pair had at the network link.

Figure 2 shows the cumulative dispersion of an 100-packet MTU train, withand without IC, at the receiver of a GigE path. The capacity estimate, asobtained without IC, is quite close to the actual capacity of the path (about1000Mbps). In the case of IC, however, we see that the application receives 10-12 packets at a very high rate (unrelated to the network capacity) separatedby about 125µs of idle time. Note that if a bandwidth estimation tool attemptsto measure the path capacity from the dispersion of packet pairs in this case,it will fail miserably because there is not a single packet pair with the correctdispersion.

0 20 40 60 80 100Packet Id

0

200

400

600

800

1000

1200

Cum

ulat

ive

disp

ersi

on (

µ se

c)

IdealWith interrupt coalesenceWithout interrupt coalesence

Fig. 2. Cumulative dispersion in an 100-packet train with and without IC.

Detection of IC: Here we describe the signature of IC that can be used todetect the presence of IC before describing specialized techniques of capacityestimation in the presence of IC.

In the presence of IC, many packets will be delivered from the NIC to thekernel and then to the application by a single interrupt. We call the packets thustransfer to the application by a single interrupt to form a burst. The packets in aburst will have the dispersion at the application layer equal to the latency T ofthe recvfrom system call. An application can measure T , and therefore determinewhich packets of a train form a burst. Therefore, if a packet train at the receiverhas a large fraction of packets forming a burst, it can indicate presence of IC.

An important point to note is that context switching at receiver can causesimilar signature in the dispersion of consecutive packets in a train. In Figure 3,illustrates that context switching can also make many consecutive packets in atrain to arrive with recvfrom latency. However, for a long enough train, there isa clear distinction in the signature of context switching than that of IC in termsof

1. Number of burst for IC is larger than some threshold,2. Variation in Number of packets in a burst (burst length)3. Variation in Dispersion of bursts.

i.e. the burst lengths and burst dispersion will be very regular due to IC ascompared to that due to context switching. These signatures taken together canbe used to robustly detect IC on different hosts.

Estimation with IC Pathrate [9] uses the dispersion of consecutive packets inlong packet trains to detect the presence of IC. To estimate the path capacity

0 20 40 60 80 100Packet Id

0

200

400

600

800

1000

1200

Cum

ulat

ive

disp

ersi

on (

µ se

c)

With interrupt coalesenceWith context switch at the receiver

Fig. 3. Signature of context switch and interrupt coalescence in cumulative dispersion.

in the presence of IC, pathrate relies on the dispersion of bursts i.e., the interar-rivals between successive bursts. Without significant cross-traffic, the dispersionbetween the first packet of two consecutive bursts is equal to BL/C, where B isthe length of the first burst in packets, L is the packet size, and C is the pathcapacity. The train length should be sufficiently long so that at least two burstsare observed in each received train. In practice, pathrate sends multiple packettrains and finally reports the median of their capacity estimates.

3.2 Available Bandwidth Estimation.

Several available bandwidth estimation tools examine the relationships betweenthe sending rate Rs and receive rate Rr of packet stream to measure the end-to-end available bandwidth A[10, 11]. The main idea is that when Rs > A, thenRr < Rs and Rr = Rs otherwise. Furthermore, when Rs > A, the queues atthe tight link build up and the packet i is expected to experience larger queuingdelays than packet i − 1. Consequently, the OWD of ith packet is expected tobe larger the OWD of i− 1th packet. On the other hand, if Rs < A, the probingpackets donot build the queue and therefore the OWDs of consecutive packetsare expected to equal. Pathload[12] and pathchirp[13] estimate the availablebandwidth by looking at the OWD variation.

Figure 4 shows the OWDs of 100 packet MTU-stream with and without IC.Sender timestamps and sends packets at the rate of 1.5 Gbps and the availablebandwidth in the path is 940 Mbps. The OWDs of successive packets, in absenceof IC, are clearly increasing implying that probing rate is greater than availablebandwidth. However, in the presence of IC, packets are buffered in NIC untilthe interrupt timer expires and the packets are delivered in back-to-back to theapplication. We refer to all the packets delivered with single interrupt as burst.

0 20 40 60 80 100Packet Id

0

50

100

150

200

250

300

350

400

One

-Way

Del

ay (

µsec

)

With Interrupt CoalescenceWithout Interrupt Coalescence

Fig. 4. OWDs in 100-packet train with and without IC.

Buffering packets in NIC adds a queuing delay, which will be highest for firstpacket in a burst and lowest or zero for the last packet in the burst, and therebydestroying the increasing OWD trend in successive packet. Suppose that sk isthe send time of kth packet in a burst and t is the time when the interrupt isgenerated. Then the OWD of kth packet is given by

dk = t + k ∗ r − sk

= t + k ∗ r − (s1 + k ∗ g)

and, the relative OWD of packet k + 1 is given by

dk+1 − dk = r − g (1)

where r time taken to transfer packet from NIC buffer to user space and g isthe inter-packet gap at the sender. From equation 1, if g > r, then the successiveOWDs in a burst will be lower. In figure 4, g = 8µs and r ≈ 3µs and there isa clear decreasing trend in OWDs in each burst. Note that tools which uses apacket pair [13, 14] or short packet stream (< 8) to probe at certain rate will failin the presence of IC, since there is no packet pair or short stream which showscorrect OWD trend. Another point to note is that, though the OWD trend isdestroyed in the short duration, the overall increasing trend in the OWDs isstill preserved. Based on this observation, we have developed an algorithm inpathload to measure available bandwidth in presence of IC.

Estimation with IC Pathload uses the technique described in previous sectionto detect whether the NIC does IC. The main idea behind estimating available

bandwidth in presence of IC is to have a filtering scheme to discard OWDs ofpackets. As mentioned earlier, when IC is enabled, queuing delay in NIC will behighest for first packet in a burst and lowest or zero for the last packet in theburst. This implies that the OWD of the last packet is least affected due to ICand should be kept for further OWD analysis.

Filtering of last packet in each burst is done by examining the interarrival ofpackets in a stream. Suppose ak is the application layer arrival timestamp of kthpacket. All the packets in the burst are delivered with single interrupt, thereforeall the packets are expected to be transferred from NIC to user space with aspacing of recvfrom latency T , where T is of the order of few microseconds.However, spacing between last packet of burst k and first packet of burst k+1 isexpected to be much greater than T . So pathload discards packet k if ak+1−ak ≈

T , else keeps it for further analysis.

3.3 Passive measurements

Passive measurements are also in widespread use on internet for many networkmeasurements studies and is ”praised” for its non-intrusive nature. Passive mon-itors are often used to collect traces of packet interarrivals, and then study cer-tain burstiness-related characteristics of network traffic. However, if the tracesare collected with commodity NICs that perform IC, the packet interarrivals canbe significantly altered, making the traces unusable for most such studies. Someof the consequences of IC in traffic monitoring are as follows. IC can make manypackets to appear at the same time making the trace look lot more bursty thatit actually is. It can also make variation in available bandwidth appear higherthat is the case. IC will destroy the correlation structure of the packet arrivals.

It is noted that certain specialized passive monitors timestamp each packetat the NIC, avoiding the negative effects of IC [15].

3.4 Breakdown of TCP’s self-clocking

Another negative effect of IC is that it can break TCP’s self-clocking [16]. Specif-ically, the TCP sender attempts to establish its self-clock, i.e. to determine howoften it should be sending packets based on the dispersion of the received ACKs.To preserve the dispersion of ACKs at the sender, or the dispersion of datasegments at the receiver, packets must be delivered to TCP as soon as theyarrive at NIC. IC, however, results in bursty delivery of data segments to thereceiver, destroying the dispersion that those packets had in the network. Thebursty arrival of segments at the receiver causes bursty transmission of ACKs,and subsequent bursty transmission of more data segments from the sender.The problem with these bursts is that TCP can experience significant losses inunder-buffered network paths, even if the corresponding links are not saturated.

Figure 5 shows the CDF of ACK interarrivals in a 10 second TCP connectionover a GigE path. Without IC, most ACKs arrive approximately every 24µs, cor-responding to the transmission time of two MTU packets at a GigE interface.This is because TCP uses delayed-ACKs, acknowledging every second packet.

With IC, however, approximately 65% of the ACKs arrive with an erratic disper-sion of less than 1 µs, as they are delivered to the kernel with a single interrupt.These “batched” ACKs trigger the transmission of long bursts of data packetsfrom the sender. Note that the rest of the ACKs, even though non-batched, stillhave a dispersion that does not correspond to the true capacity of this path,misleading the TCP sender for the actual rate that this path can sustain.

0 20 40 60 80 100 120Interarrival time of ACKS (µsec)

0

0.2

0.4

0.6

0.8

1

CD

F

With interrupt coalescenceWithout interrupt coalescence

Fig. 5. CDF of ACK interarrivals in a 10 second TCP connection over a GigE path.

4 Conclusions

In this paper, we described how IC is implemented on commodity NICs. IC hasbeen used extensively to reduce the interrupt processing overhead and becomesnecessary when the ratio of packet interarrival time to per-packet processingtime becomes small. We have identified the scenarios where IC has negativeeffects on tools and techniques relying on accurate measurement of networkproperties. Specifically, we showed that capacity estimation tools based on packetdispersion and available bandwidth measurement tools based to receive rate cangive inaccurate results. We have developed new algorithms to robustly detectwhether IC is enabled on measurement host and estimate capacity and availablebandwidth in presence of IC. IC can also affect the interarrivals of packets if thecommodity NICs are used as passive monitors and users of such traces shouldtake this into account while using traces for their study. We also demonstratethat IC can break-down TCP’s self-clocking algorithm by delivering data packetsand ack’s in burst to the protocol stack.

References

1. Mogul, J.C., Ramakrishnan, K.K.: Eliminating receive livelock in an interrupt-driven kernel. ACM Transactions on Computer Systems 15 (1997) 217–252

2. Prasad, R.S., Murray, M., Dovrolis, C., Claffy, K.: Bandwidth Estimation: Metrics,Measurement Techniques, and Tools. IEEE Network 17 (2003) 18–23

3. Dovrolis, C., Ramanathan, P., Moore, D.: Packet Dispersion Techniques and Ca-pacity Estimation. Technical report, University of Delaware (2002) Submitted forpublication to the IEEE/ACM Transactions on Networking.

4. Jain, M., Dovrolis, C.: End-to-End Available Bandwidth: Measurement Method-ology, Dynamics, and Relation with TCP Throughput (2003)

5. Intel: Interrupt Moderation Using Intel Gigabit Ethernet Controllers. http://www.intel.com/design/network/applnots/ap450.pdf (2003)

6. Syskonnect: SK-NET GE Gigabit Ethernet Server Adapter. http://www.syskonnect.com/syskonnect/technology/SK-NET GE.PDF (2003)

7. Intel Gigabit Ethernet Driver. http://sourceforge.net/projects/e1000 (2003)8. Syskonnect: SysKonnect Gigabit Ethernet Driver. http://www.syskonnect.com/

syskonnect/support/driver/ge.htm (2003)9. Dovrolis, C., Ramanathan, P., Moore, D.: What do Packet Dispersion Techniques

Measure? In: Proceedings of IEEE INFOCOM. (2001) 905–91410. Melander, B., Bjorkman, M., Gunningberg, P.: A New End-to-End Probing and

Analysis Method for Estimating Bandwidth Bottlenecks. In: IEEE Global InternetSymposium. (2000)

11. Hu, N., Steenkiste, P.: Evaluation and Characterization of Available BandwidthProbing Techniques. IEEE Journal on Selected Areas in Communications (2003)

12. Jain, M., Dovrolis, C.: End-to-End Available Bandwidth: Measurement Method-ology, Dynamics, and Relation with TCP Throughput. In: Proceedings of ACMSIGCOMM. (2002) 295–308

13. Ribeiro, V., Riedi, R., Baraniuk, R., Navratil, J., Cottrell, L.: pathChirp: EfficientAvailable Bandwidth Estimation for Network Paths. In: Proceedings of Passiveand Active Measurements (PAM) workshop. (2003)

14. Pasztor, A.: Accurate Active Measurement in the Internet and its Applications.PhD thesis, The University of Melbourne (2003)

15. DAG Project. (http://dag.cs.waikato.ac.nz/)16. Jacobson, V.: Congestion avoidance and control. ACM Computer Communication

Review; Proceedings of the Sigcomm ’88 Symposium in Stanford, CA, August,1988 18, 4 (1988) 314–329


Recommended