+ All Categories
Home > Technology > LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Date post: 15-Jan-2015
Category:
Upload: linaro
View: 1,007 times
Download: 2 times
Share this document with a friend
Description:
Resource: LCA14 Name: LCA14-506: Comparative analysis of preemption vs preempt-rt Date: 07-03-2014 Speaker: Gary Robertson Video: https://www.youtube.com/watch?v=QiguBicpB88 Website: http://www.linaro.org/ Linaro Connect: http://connect.linaro.org/ Slide: http://www.slideshare.net/linaroorg/lca14-lca14506-comparative-analysis-of-preemption-vs-preemptrt
Popular Tags:
31
Gary Robertson, LCA14, Macau LCA14-506: Comparative analysis of preemption vs preempt-rt
Transcript
Page 1: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Gary Robertson, LCA14, Macau

LCA14-506: Comparative analysis of preemption vs preempt-rt

Page 2: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

In this presentation we will try to illustrate the pro’s, con’s, and latency characteristics of several Linux kernel preemption models, and provide some guidance in selecting an appropriate preemption model for a given category of application.

Overview of Topics Presented

Page 3: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Questions we will address include:

• Which preemption model provides the best throughput?• Which model offers the lowest average latencies?• Which model offers the lowest maximum latencies?• Which model offers the most predictable latencies?• How do load conditions impact the respective latency

performance of the various models?• What impact does CPU Frequency Scaling or CPU

sleep states have on latency performance?• What is the best model for a given application type?

Overview of Topics - continued

Page 4: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Our intent is to show relative trends between the preemption models under the same conditions… so the data presented were gathered thusly:

• Each preemption model configuration was tested using identical tests running on the same InSignal Arndale.

• Cyclictest was used for a run duration of two hours with a single thread executing at a SCHED_FIFO priority of 80 to realistically represent scheduling latency for a real-time process.

• A cyclictest run was done with no system load, then another with an externally-applied ping flood, and another with back-to-back executions of hackbench running to represent maximum system loading.

Test Rationale and Methodology

Page 5: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Latency Impact of CPU Frequency Scaling

Page 6: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Only three Linux preemption models are really interesting for anything other than desktop use:

• the Server preemption model provides optimal throughput for applications where latencies are not an issue

• the Low Latency Desktop preemption model provides low average latencies for interactive and ‘soft real-time’ applications

• the Full RT preemption model provides the highest level of latency determinism for ‘hard real-time’ applications

Tested Linux Preemption Models

Page 7: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Server Preemption Model Latencies

Cyclictest with no system loadCPU frequency scaling disabled

Minimum Latency: 16 usecAverage Latency: 24 usecMost Frequent Latency: 24 usecMaximum Latency: 572 usecStandard Deviation: 1.211041

Almost all latencies between 20 usec and 28 usec

However, even at light loads, latencies out to 572 usec were observed. This is a consequence of all code paths through the kernel being non-preemptible.

Page 8: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Server Preemption Model Latencies

Cyclictest with ping flood loadCPU frequency scaling disabled

Minimum Latency: 15 usecAverage Latency: 23 usecMost Frequent Latency: 24 usecMaximum Latency: 592 usecStandard Deviation: 1.580778

Almost all latencies between 20 usec and 28 usec

Note, however, that much longer latencies continue to be observed due to lack of any design efforts to avoid them. Also note that maximum latency is already beginning to creep upwards.

Page 9: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Server Preemption Model Latencies

Cyclictest with hackbench loadCPU frequency scaling disabled

Minimum Latency: 17 usecAverage Latency: 150655 usecMost Frequent Latency: 22 usecMaximum Latency: 2587753 usecStandard Deviation: 493977.9

The majority of latencies were between 21 usec and 25 usec, gradually tapering off to single digit frequencies at 204 usec. Note the duration of the max latency is 4000 times longer than under no load!

Note also the much lower frequency percentage for the peak occurrence. This means a larger percentage of the higher latencies were observed, and illustrates the serious degradation of latency determinism under load in a non-preemptible kernel where latency was not a primary design consideration.

Page 10: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Low Latency Desktop Model Latencies

Cyclictest with no system loadCPU frequency scaling disabled

Minimum Latency: 19 usecAverage Latency: 28 usecMost Frequent Latency: 29 usecMaximum Latency: 57 usecStandard Deviation: 0.8698308

The majority of latencies were between 28 usec and 31 usec, quickly tapering off to single digit frequencies at 42 usec.

Maximum latency was reduced tenfold under light loads vs. the Server model. This illustrates the significant improvements in latency performance under light loads with kernel preemption enabled.

Page 11: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Low Latency Desktop Model Latencies

Cyclictest with ping floodCPU frequency scaling disabled

Minimum Latency: 18 usecAverage Latency: 29 usecMost Frequent Latency: 29 usecMaximum Latency: 131 usecStandard Deviation: 1.79573

The majority of latencies were between 28 usec and 32 usec, quickly tapering off to single digit frequencies at 80 usec.

The reduced range of observed latencies indicates improved latency performance and predictability at moderate loads versus the Server model. However, as the next slide will show, latency performance in this model degrades seriously under heavy load, making Full RT a better choice for latency performance under heavy load conditions.

Page 12: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Low Latency Desktop Model Latencies

Cyclictest with hackbenchCPU frequency scaling disabled

Minimum Latency: 19 usecAverage Latency: 370606 usecMost Frequent Latency: 25 usecMaximum Latency: 4122148 usecStandard Deviation: 826092

The majority of latencies were between 24 usec and 26 usec, gradually tapering off to single digit frequencies at 105 usec. Note that the max latency was 70,000 times longer than with this model under no load!

Max latencies were nearly double that for a Server model under heavy load, and latency predictability is low. This illustrates the combined impacts under heavy load of increased context switches without addressing priority inversion or FIFO queueing disciplines.

Page 13: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Full RT Preemption Model Latencies

Cyclictest with no system loadCPU frequency scaling disabled

Minimum Latency: 19 usecAverage Latency: 29 usecMost Frequent Latency: 29 usecMaximum Latency: 53 usecStandard Deviation: 1.031893

The majority of latencies were between 29 usec and 31 usec, quickly tapering off to single digit frequencies at 50 usec.

Maximum latency was reduced tenfold under light loads vs. the Server model. This illustrates the significant improvements in latency performance under light loads with kernel preemption enabled.

Under light load performance is very similar to that of the Low Latency Desktop model.

Page 14: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Full RT Preemption Model Latencies

Cyclictest with ping floodCPU frequency scaling disabled

Minimum Latency: 19 usecAverage Latency: 29 usecMost Frequent Latency: 30 usecMaximum Latency: 59 usecStandard Deviation: 2.698587

The majority of latencies were between 29 usec and 31 usec, quickly tapering off to single digit frequencies at 53 usec.

The reduced range of observed latencies indicates improved latency performance and predictability at moderate loads versus the Server model.

Note that even at moderate loads the maximum latencies are less than half the duration of those seen in the Low Latency Desktop model.

Page 15: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Full RT Preemption Model Latencies

Cyclictest with hackbench loadCPU frequency scaling disabled

Minimum Latency: 21 usecAverage Latency: 29 usecMost Frequent Latency: 25 usecMaximum Latency: 156 usecStandard Deviation: 7.69571

The majority of latencies were between 24 usec and 26 usec, with a second group peaking between 43 and 44 usec, and quickly tapering off to single digit frequencies at 134 usec.

Latency performance under heavy load is much better than in any of the other preemption models. With threaded interrupt handlers, priority inheritance and priority-based queuing disciplines, the real-time process is still able to meet much tighter scheduling deadlines despite heavy activity of other lower-priority threads.

Page 16: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Comparative Latency Performance

Page 17: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Comparative Latency Performance

Page 18: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Comparative Latency Performance

Page 19: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

• For applications in which throughput and not latencies are the primary consideration, opt for the Server model

• If quality of service is important but missed latency deadlines will not result in catastrophic failures, opt for the Low Latency Desktop model and size the hardware capacity to keep loading moderate

• For host environments for ‘zero overhead Linux’ (ODP for example), Low Latency Desktop is a good choice

• If latencies must be consistent even under high load conditions Full RT may be required

• For applications based on POSIX real-time scheduling and priority-based preemption, use Full RT for best results

What Preemption Model Is Best for Me?

Page 20: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

The test scripts, data files, and graphs used to provide reference data for this presentation may be accessed online at the following URL:

http://people.linaro.org/~gary.robertson/LCA14

Data References

Page 21: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

More about Linaro Connect: http://connect.linaro.orgMore about Linaro: http://www.linaro.org/about/

More about Linaro engineering: http://www.linaro.org/engineering/Linaro members: www.linaro.org/members

Page 22: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Preemption Model Characteristics

Appendix A

Page 23: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

The Server preemption model lies at one extreme of the latency vs. throughput continuum.

Pro’s include:• Simplicity, maturity and robustness make this a very

reliable platform• With no preemption the reduced number of context

switches minimizes system overhead and maximizes overall throughput

Server Model Characteristics

Page 24: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Con’s include:• The lack of preemption results in low average

latencies under low loads but much higher latencies when the system is heavily loaded

• The latencies imposed by different execution paths through the kernel result in a wide range of latency durations and low latency determinism

Server Model Characteristics

Page 25: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

The Low Latency Desktop preemption model holds the middle ground in the latency vs. throughput continuum.

Pro’s include:• Under low to moderate load, latency range and

predictability are significantly improved vs. the Server model

• This preemption model is supported as part of the mainstream kernel and tends to be less trouble-prone than Full RT preemption

Low Latency Desktop Model Characteristics

Page 26: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Con’s include:• The preemption of kernel operations and increased

number of context switches create increased overhead and reduced performance relative to the Server model

• The preemption of kernel operations results in higher average latencies vs. the Server preemption model

• This preemption model does not perform as well under heavy system loads as other models

Low Latency Desktop Model Characteristics

Page 27: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

The following software-induced latency sources remain problematic in the Low Latency Desktop preemption model:

• Exceptions, software interrupts, and device service request interrupts execute outside of scheduler control

• Most mutual exclusion locking primitives are subject to priority inversion

• Shared resources use FIFO-based queueing disciplines, meaning high-priority threads may have to wait behind lower priority threads for access to the resources

These factors result in lower levels of latency determinism.

Low Latency Desktop Model - continued

Page 28: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

The Full RT preemption model represents the latency-centric end of the latency vs. throughput continuum. It attempts to mitigate all the remaining software-induced sources of latency.• Handlers for exceptions, software interrupts, and

device service request interrupts are encapsulated inside threads which are under scheduler control

• Priority inheritance is added for most mutual exclusion locking primitives to prevent priority inversions

• Shared resources use priority-based queueing disciplines so that the highest-priority thread always gets first access to the resources

Full RT Model Characteristics

Page 29: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

The Full RT preemption model inevitably suffers from reduced overall throughput as a consequence of its efforts to maximize latency determinism:• Schedulable ‘threaded’ ISRs result in the highest

levels of preemption and context switch overhead• Priority inheritance involves iterative logic to

temporarily boost the priorities of all lock holders to equal that of the highest-priority lock waiters. This adds significant overhead to locking primitive code.

• Priority-based queueing requires sorting the queue each time a new waiting thread is added

Full RT Model Characteristics - continued

Page 30: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Pro’s include:• The most consistent and predictable latency

performance available in any preemption model• The best support environment for creating priority-

based multi-layered applications• The best hard real-time support available in a Linux

environment

Full RT Model Characteristics - continued

Page 31: LCA14: LCA14-506: Comparative analysis of preemption vs preempt-rt

Con’s include:• Full RT preemption is supported only with a

separately maintained kernel patch set• The latest supported RT kernel version always lags

behind mainstream development• Mainstream drivers, libraries, and applications may

not always function properly in the Full RT environment

• Poorly designed or written real-time threads may starve out threaded interrupt handlers

Full RT Model Characteristics - continued


Recommended