+ All Categories
Home > Documents > LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Date post: 13-Jun-2015
Category:
Upload: linaro
View: 1,002 times
Download: 2 times
Share this document with a friend
Description:
Resource: LCA14 Name: LCA14-306: CPUidle & CPUfreq integration with scheduler Date: 05-03-2014 Speaker: Daniel Lezcano, Mike Turquette Video: https://www.youtube.com/watch?v=Ug4uQEYwl5s
Popular Tags:
29
Wed 5 March, 11:15am, Daniel Lezcano, Mike Turquette LCA14-306: CPUidle & CPUfreq integration with scheduler
Transcript
Page 1: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Wed 5 March, 11:15am, Daniel Lezcano, Mike Turquette

LCA14-306: CPUidle & CPUfreq integration with scheduler

Page 2: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Introduction

● Power aware discussion

● Patchset « Small task packing »− Some informations shared between cpuidle and the

scheduler− https://lwn.net/Articles/520857/

● « Line on the sand » by Ingo Molnar− Integrate first cpuidle and cpufreq with the scheduler− http://lwn.net/Articles/552885/

Page 3: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Scheduler CPUidle

Idle task

Governor CPUidle backenddriver

cpuidle_idle_callswitch_to

cpuidle_select cpuidle_enter

CPUidle + scheduler : Current design

Page 4: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Idle time measurement

● From the scheduler :− The duration of the idle task is running− Includes the interrupt processing time

● From CPUidle :− The duration between interrupts

● CPUIdle code happens with local interrupts disabled

● T(idle task) = Σ T(CPUidle) + Σ T(irqs)

Page 5: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Idle time measurement

Page 6: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Idle time measurement unification

● What is the impact of returning to the scheduler each time an interrupt occurred ?− Scheduler will choose the idle task again if nothing

to do− Mainloop code simplified− Idle time measured nearly the same for the

scheduler and cpuidle− Probably a negative impact on performance to fix

Page 7: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Load balance

● Taking the decision to balance a task when going to idle

■ Use of avg_idle● Does not use how long the cpu will sleep

■ The idle state should be selected before■ CPUIdle should give the state the cpu will be

● Balance a task to the idlest cpu■ Does not use the cpu's exit latency■ CPUidle should give back the state the cpu is

Page 8: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

CPUidle main function

● Reduce the distance between the scheduler and the cpuidle framework− Move the idle task to kernel/sched− Move the cpuidle_idle function in the idle task code− Integrate the idle mainloop and cpuidle_idle_call

● Allows to access the scheduler's private structure definition

Page 9: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Menu governor split

● The events could be classified in three categories :1. Predictable → timers2. Repetitive → IOs3. Random → key stroke, incoming packet

● Category 2 could be integrated into the scheduler

Page 10: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

IO latency tracking

● IO are repetitive within a reasonable interval to assume it as predictable enough

Page 11: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

IO latency tracking

● Measurement from the scheduler− io_schedule− io_schedule_timeout

● Count per task the io latency− Task migration moves IO history unlike current

governor− Latency constraint for the task

Page 12: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Combine informations

● Move predictable event framework in the scheduler

● Informations combined between the scheduler and menu governor will be more accurate− Idle balance decision based on the idle state a cpu

is or about to enter− Load tracking from task for idle state exit latency− CPU computation power and topology− DVFS strategies for exit idle state boost

Page 13: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Scheduler + CPUidle

● The scheduler should have all the informations to tell CPUidle :− How long it will sleep− What is the latency constraint

● The CPUidle should use the information provided by the scheduler :− Select an idle state− Use the backend driver idle callback− No more heuristics

Page 14: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Status

● A lot of cleanups around the idle mainloop

● CPUidle main function inside the idle mainloop− Code distance reduced, sharing the structures

scheduler/cpuidle− Communication between sub-systems made easier

Page 15: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Work in progress

● First iteration of IO latency tracking implemented− Validation in progress

● Simple governor for CPUIdle− Select a state

● Idle time unification experimentation

Page 16: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

CPUfreq + scheduler

The title is misleading … CPUfreq may completely disappear in the future.

Page 17: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

CPUfreq + scheduler

The title is misleading … CPUfreq may completely disappear in the future.

Goal is to initiate CPU dynamic voltage & frequency scaling (DVFS) from the Linux scheduler

Page 18: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

CPUfreq + scheduler

The title is misleading … CPUfreq may completely disappear in the future.

Goal is to initiate CPU dynamic voltage & frequency scaling (DVFS) from the Linux scheduler

Nobody knows what this will look like, so please ask questions and raise suggestions

Page 19: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

• Polling workqueue• E.g. ondemand

• Based on idle time / busyness

• No relation to decisions taken by the scheduler

• Task may be run at any time

• No relation to idle task• In fact, task will not wake-up during idle

CPUfreq today

Page 20: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

• Replace polling loop with event driven action

• Scheduler already takes action which affects available compute capacity• Load balance• Migrating tasks to and from CPUs of different compute capacity

• DVFS transitions are a natural fit

Event driven behavior

Page 21: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

• Method to initiate CPU DVFS transitions from the scheduler

• Identify call sites to initiate those transitions• Enqueue/dequeue task• Load balance• Idle entry/exit• Aggressively schedule deadline tasks• Maybe others

• Define interface between the scheduler & the DVFS thingy• Currently a power driver in Morten’s RFC• Remove CPUfreq governor layer from the power driver completely?

Lots of work ahead

Page 22: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

• Experiment with policy• When and where to evaluate if frequency should be changed• What metrics are important to the algorithm?• DVFS versus race-to-idle

• Integrate with power model

• Benchmark performance & power• Performance regressions• Does it save power?

• Make it work with non-CPUfreq things like PSCI and ACPI for changing CPU P-state

Lots of work ahead, part 2

Page 23: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

• https://lkml.org/lkml/2013/10/11/547

• Replaces polling loop in CPUfreq governor with scheduler event-driven action

• CPUfreq machine drivers are re-used initially

• CPUfreq governor becomes a shim layer to the power driver

Morten’s power aware scheduling RFC

Page 24: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

• DVFS task is itself scheduled on a workqueue• Might not be run for some time after the scheduler determines that a

DVFS transition should happen

• Kworker threads are filtered out• Prevents infinite reentrancy into the scheduler• CPU capacity is not changed when enqueuing and dequeuing these

tasks

Nitty gritty details

Page 25: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

include/linux/sched/power.h

struct power_driver { /* * Power driver calls may happen from scheduler context with irq * disabled and rq locks held. This must be taken into account in * the power driver. */ /* cpu already at max capacity? */ int (*at_max_capacity) (int cpu); /* Increase cpu capacity hint */ int (*go_faster) (int cpu, int hint); /* Decrease cpu capacity hint */ int (*go_slower) (int cpu, int hint); /* Best cpu to wake up */ int (*best_wake_cpu) (void); /* Scheduler call-back without rq lock held and with irq enabled */ void (*late_callback) (int cpu);};

Page 26: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

• https://github.com/mturquette/linux/commits/sched-cpufreq

• Replaced workqueue method with per-CPU kthread• This allows removal of the kworker filter• Please commence bikeshedding over the name of this kthread

• Use SCHED_FIFO policy for the task• Will be run before the normal work (right?)

• These patches were just validated yesterday• Bugs• Holes in logic• Misunderstandings• Voided warranties

Incremental changes on top

Page 27: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

• Gather more opinions on the power driver interface

• Is go_faster/go_slower the right way?• Spoiler alert: Probably not.

• When else might we want to evaluate CPU frequency?• Idle entry/exit as mentioned by Daniel• Cluster-level considerations

• Sched domains• Not just per-core• Four Cortex-A9’s with single CPU clock

• Coordinate with the power model work

What’s next?

Page 28: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

Questions?

Page 29: LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler

More about Linaro Connect: http://connect.linaro.orgMore about Linaro: http://www.linaro.org/about/

More about Linaro engineering: http://www.linaro.org/engineering/Linaro members: www.linaro.org/members


Recommended