LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Gary Robertson, LCA14, Macau

LCA14-506: Comparative analysis of preemption vs preempt-rt

In this presentation we will try to illustrate the pro’s, con’s, and latency characteristics of several Linux kernel preemption models, and provide some guidance in selecting an appropriate preemption model for a given category of application.

Overview of Topics Presented

Questions we will address include:

• Which preemption model provides the best throughput?• Which model offers the lowest average latencies?• Which model offers the lowest maximum latencies?• Which model offers the most predictable latencies?• How do load conditions impact the respective latency

performance of the various models?• What impact does CPU Frequency Scaling or CPU

sleep states have on latency performance?• What is the best model for a given application type?

Overview of Topics - continued

Our intent is to show relative trends between the preemption models under the same conditions… so the data presented were gathered thusly:

• Each preemption model configuration was tested using identical tests running on the same InSignal Arndale.

• Cyclictest was used for a run duration of two hours with a single thread executing at a SCHED_FIFO priority of 80 to realistically represent scheduling latency for a real-time process.

• A cyclictest run was done with no system load, then another with an externally-applied ping flood, and another with back-to-back executions of hackbench running to represent maximum system loading.

Test Rationale and Methodology

Latency Impact of CPU Frequency Scaling

Only three Linux preemption models are really interesting for anything other than desktop use:

• the Server preemption model provides optimal throughput for applications where latencies are not an issue

• the Low Latency Desktop preemption model provides low average latencies for interactive and ‘soft real-time’ applications

• the Full RT preemption model provides the highest level of latency determinism for ‘hard real-time’ applications

Tested Linux Preemption Models

Server Preemption Model Latencies

Cyclictest with no system loadCPU frequency scaling disabled

Minimum Latency: 16 usecAverage Latency: 24 usecMost Frequent Latency: 24 usecMaximum Latency: 572 usecStandard Deviation: 1.211041

Almost all latencies between 20 usec and 28 usec

However, even at light loads, latencies out to 572 usec were observed. This is a consequence of all code paths through the kernel being non-preemptible.


Cyclictest with ping flood loadCPU frequency scaling disabled


Almost all latencies between 20 usec and 28 usec

Note, however, that much longer latencies continue to be observed due to lack of any design efforts to avoid them. Also note that maximum latency is already beginning to creep upwards.


Cyclictest with hackbench loadCPU frequency scaling disabled


The majority of latencies were between 21 usec and 25 usec, gradually tapering off to single digit frequencies at 204 usec. Note the duration of the max latency is 4000 times longer than under no load!

Note also the much lower frequency percentage for the peak occurrence. This means a larger percentage of the higher latencies were observed, and illustrates the serious degradation of latency determinism under load in a non-preemptible kernel where latency was not a primary design consideration.

Low Latency Desktop Model Latencies



The majority of latencies were between 28 usec and 31 usec, quickly tapering off to single digit frequencies at 42 usec.

Maximum latency was reduced tenfold under light loads vs. the Server model. This illustrates the significant improvements in latency performance under light loads with kernel preemption enabled.


Cyclictest with ping floodCPU frequency scaling disabled



The reduced range of observed latencies indicates improved latency performance and predictability at moderate loads versus the Server model. However, as the next slide will show, latency performance in this model degrades seriously under heavy load, making Full RT a better choice for latency performance under heavy load conditions.


Cyclictest with hackbenchCPU frequency scaling disabled

Minimum Latency: 19 usecAverage Latency: 370606 usecMost Frequent Latency: 25 usecMaximum Latency: 4122148 usecStandard Deviation: 826092

The majority of latencies were between 24 usec and 26 usec, gradually tapering off to single digit frequencies at 105 usec. Note that the max latency was 70,000 times longer than with this model under no load!

Max latencies were nearly double that for a Server model under heavy load, and latency predictability is low. This illustrates the combined impacts under heavy load of increased context switches without addressing priority inversion or FIFO queueing disciplines.

Full RT Preemption Model Latencies




Maximum latency was reduced tenfold under light loads vs. the Server model. This illustrates the significant improvements in latency performance under light loads with kernel preemption enabled.

Under light load performance is very similar to that of the Low Latency Desktop model.


Cyclictest with ping floodCPU frequency scaling disabled



The reduced range of observed latencies indicates improved latency performance and predictability at moderate loads versus the Server model.

Note that even at moderate loads the maximum latencies are less than half the duration of those seen in the Low Latency Desktop model.


Cyclictest with hackbench loadCPU frequency scaling disabled


The majority of latencies were between 24 usec and 26 usec, with a second group peaking between 43 and 44 usec, and quickly tapering off to single digit frequencies at 134 usec.

Latency performance under heavy load is much better than in any of the other preemption models. With threaded interrupt handlers, priority inheritance and priority-based queuing disciplines, the real-time process is still able to meet much tighter scheduling deadlines despite heavy activity of other lower-priority threads.

Comparative Latency Performance



• For applications in which throughput and not latencies are the primary consideration, opt for the Server model

• If quality of service is important but missed latency deadlines will not result in catastrophic failures, opt for the Low Latency Desktop model and size the hardware capacity to keep loading moderate

• For host environments for ‘zero overhead Linux’ (ODP for example), Low Latency Desktop is a good choice

• If latencies must be consistent even under high load conditions Full RT may be required

• For applications based on POSIX real-time scheduling and priority-based preemption, use Full RT for best results

What Preemption Model Is Best for Me?

The test scripts, data files, and graphs used to provide reference data for this presentation may be accessed online at the following URL:

http://people.linaro.org/~gary.robertson/LCA14

Data References



More about Linaro Connect: http://connect.linaro.orgMore about Linaro: http://www.linaro.org/about/

More about Linaro engineering: http://www.linaro.org/engineering/Linaro members: www.linaro.org/members

http://www.linaro.org/about/

http://www.linaro.org/about/

http://www.linaro.org/engineering/

http://www.linaro.org/members

Preemption Model Characteristics

Appendix A

The Server preemption model lies at one extreme of the latency vs. throughput continuum.

Pro’s include:• Simplicity, maturity and robustness make this a very

reliable platform• With no preemption the reduced number of context

switches minimizes system overhead and maximizes overall throughput

Server Model Characteristics

Con’s include:• The lack of preemption results in low average

latencies under low loads but much higher latencies when the system is heavily loaded

• The latencies imposed by different execution paths through the kernel result in a wide range of latency durations and low latency determinism

Server Model Characteristics

The Low Latency Desktop preemption model holds the middle ground in the latency vs. throughput continuum.

Pro’s include:• Under low to moderate load, latency range and

predictability are significantly improved vs. the Server model

• This preemption model is supported as part of the mainstream kernel and tends to be less trouble-prone than Full RT preemption

Low Latency Desktop Model Characteristics

Con’s include:• The preemption of kernel operations and increased

number of context switches create increased overhead and reduced performance relative to the Server model

• The preemption of kernel operations results in higher average latencies vs. the Server preemption model

• This preemption model does not perform as well under heavy system loads as other models

Low Latency Desktop Model Characteristics

The following software-induced latency sources remain problematic in the Low Latency Desktop preemption model:

• Exceptions, software interrupts, and device service request interrupts execute outside of scheduler control

• Most mutual exclusion locking primitives are subject to priority inversion

• Shared resources use FIFO-based queueing disciplines, meaning high-priority threads may have to wait behind lower priority threads for access to the resources

These factors result in lower levels of latency determinism.

Low Latency Desktop Model - continued

The Full RT preemption model represents the latency-centric end of the latency vs. throughput continuum. It attempts to mitigate all the remaining software-induced sources of latency.• Handlers for exceptions, software interrupts, and

device service request interrupts are encapsulated inside threads which are under scheduler control

• Priority inheritance is added for most mutual exclusion locking primitives to prevent priority inversions

• Shared resources use priority-based queueing disciplines so that the highest-priority thread always gets first access to the resources

Full RT Model Characteristics

The Full RT preemption model inevitably suffers from reduced overall throughput as a consequence of its efforts to maximize latency determinism:• Schedulable ‘threaded’ ISRs result in the highest

levels of preemption and context switch overhead• Priority inheritance involves iterative logic to

temporarily boost the priorities of all lock holders to equal that of the highest-priority lock waiters. This adds significant overhead to locking primitive code.

• Priority-based queueing requires sorting the queue each time a new waiting thread is added

Full RT Model Characteristics - continued

Pro’s include:• The most consistent and predictable latency

performance available in any preemption model• The best support environment for creating priority-

based multi-layered applications• The best hard real-time support available in a Linux

environment


Con’s include:• Full RT preemption is supported only with a

separately maintained kernel patch set• The latest supported RT kernel version always lags

behind mainstream development• Mainstream drivers, libraries, and applications may

not always function properly in the Full RT environment

• Poorly designed or written real-time threads may starve out threaded interrupt handlers


Date post:	15-Jan-2015
Category:	Technology
Upload:	linaro
View:	1,007 times
Download:	2 times

LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Technology