+ All Categories
Home > Software > HKG15-107: ACPI Power Management on ARM64 Servers (v2)

HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Date post: 18-Jul-2015
Category:
Upload: linaro
View: 356 times
Download: 5 times
Share this document with a friend
Popular Tags:
20
Presented by Date HKG15-107 ACPI Power Management on ARM64 Servers Linaro Enterprise Group Ashwin Chaugule 2/9/2015
Transcript
Page 1: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Presented by

Date

HKG15-107 ACPI Power Management on ARM64 Servers

Linaro Enterprise GroupAshwin Chaugule

2/9/2015

Page 2: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Overview● CPU Performance management

○ CPPC (Collaborative Processor Performance Control)

○ PCC (Platform Communication Channel)○ State of patchwork○ Next steps

● CPU idle management overview● Device power management overview

Page 3: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

um. Hello there!

Page 4: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Power Management overview● Overall goal is to run the system as efficiently as possible considering power and performance● Active power management

● Minimize power when the system is active and running● Idle power management

● Go to deepest possible idle state with most power savings while considering workloads desired response time

● Limits management● Deliver max possible performance within the system constraints

● Servers are plugged in and not backed by batteries○ Cost of power is significant in TCO

● Server workloads typically have a high dynamic range of CPU utilization● Burst of activity depending on time zones, holiday sales etc.● Not always running at peak CPU utilization

● Need to be very efficient across the whole range

Page 5: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

CPU Performance Management● CPPC = Collaborative Processor Performance Control● New method to manage CPU performance● Defined since ACPI v5.0+● Preferred method for ARM64 servers vs PSS● Richer interface supersedes ~12 ACPI objects and notifications● Performance requests are made on an abstract unit less and continuous scale● Firmware on the remote processor is free to interpret values however it wants

○ Can choose to map unit as CPU freq. similar to “p-states”○ Could be a combination of freq + other architecture specific performance knobs

● Handling in firmware prevents risk of preempting freq transitions in the kernel● Also allows for much wider portability● OS should not assume any specific meaning to the performance scale● Per CPU table (CPC) describes each CPUs performance capabilities and controls● Contents of table can be registers (h/w, memory mapped or PCC) or static integers

Page 6: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Alternate method● PSS = Performance Supported States

○ Discretized table of CPU frequencies○ Assumes all CPUs have identical P states

● Requires X86 like mechanisms to write to a register to change CPU frequency● Processor Throttling Controls

○ PTC, TSS, TPC○ Throttling states available to the CPU as a percentage of max

● Needs ARM specific spec updates

Page 7: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

CPPC high level flow● Platform enumerates CPU performance

range to the OS● Highest Performance:

○ Highest performance capability of a CPU

● Nominal Performance:○ Max sustained perf level

● Lowest Nonlinear performance:○ Lowest perf level at which non-linear

power savings achievable. Lower than this level could be suboptimal

● Lowest Performance:○ Lowest perf capability

Page 8: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

CPPC high level flow● OS requests desired performance● Maximum Performance:

○ Upper bound on desired performance● Desired Performance:

○ Ideal desired perf level● Performance Reduction Tolerance:

○ Deviation below Desired Performance that the platform is allowed to run. If OS requests Desired perf over a specific Time Window, then this is the average performance to be delivered over the Time Window. Time Window is specific in another register.

● Minimum Performance:○ Lower bound on desired performance

Page 9: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Other CPPC feedback regs● Platform may be aware of power budgets and thermal constraints● It can limit delivered performance by reading instantaneous values of specific sensors or

counters● Provides notification back to OS when limits change● Reference Performance Counter:

● Counts at fixed rate when processor is active● Delivered Performance Counter:

● Counts at rate of current performance level taking Desired into account

● Guaranteed Performance:● Sustained Performance level deliverable by Platform given current constraints● Raises a notification when this level changes

● Performance Limited Register:○ In the event of some constraint (e.g. thermal excursion), this reg has 2 bits defined.

indicates platform unexpectedly delivers less than Desired or less than min.

Page 10: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Per CPU CPPC descriptor● Each entry of descriptor is either an integer

or a register● Register could be described as a hardware

register, System I/O or PCC register● PCC registers have following format:

Page 11: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

PCC: Platform Communication Channel● ACPI v5.0+ defines a mailbox-like mechanism for OS to communicate with a

remote processor and back. e.g. BMC● ACPI table for PCC (PCCT) defines a list of PCC subspaces/channels● Each subspace entry defines:

○ Shared communication region address○ Command and status fields for this region○ Doorbell semantics for channel

● PCC commands are client specific○ Clients defined in the current ACPI v5.1 spec include

■ CPPC■ MPST (Memory node power state table)■ RAS

● Doorbell protocol defines exclusivity of access to PCC channel between OS and remote processor

● Supports async mode of notification from remote via IRQ

Page 12: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

PCC: High level flow● PCC Reads:

○ Client acquires a PCC channel lock (client specific)○ Rings doorbell with READ cmd

■ Client waits for command completion○ Client reads data updated by remote processor in comm space○ Client releases PCC channel lock

● PCC Writes:○ Client acquires a PCC channel lock (client specific)○ Client writes data to comm space○ Rings doorbell with WRITE cmd

■ Client waits for command completion○ Client releases PCC channel lock

● If command completion fails, Client must retry or assume failure

Page 13: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Linux support for CPPC + PCC● PCC

○ Integrated as mailbox controller○ Initial patchwork in upstream kernels today (3.19-rcX)

● CPPC○ CPPC parsing methods abstracted into separate files○ CPUFreq driver that plugs into existing governors (e.g. ondemand)

■ ondemand ignores CPU freq. which could lead to suboptimal choice of next freq

■ Patchwork (v4) with CPUfreq integration under review○ Investigating PID style governor

■ Early patchwork adapted governor from intel_pstate■ Experiments on ARM64 led to extensive modifications in the way CPU

busy is calculated● Frequency weighted CPU busyness including idle time● Move busyness math to workqueue

■ Intel pstate PID suboptimally raises next freq request if workload doesn’t cause timer to defer > 30ms

■ Need more experimentation on silicon

Page 14: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

CPPC + PCC

PCC driver

CPPC lib

CPPC CPUFreq driverCPPC driver with inbuilt governor

CPUFreq governors

Hardware registers, System I/O

CPPC tables

PCCT table

PCC firmware interface

CPU Performance handlers

LINUX

Remote Processor

Page 15: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

CPU idle management overview● As of current spec (v5.1)● C states defined for each

processor○ C0 - On○ C1 - Cn -> ascending

order or idleness● C state object for each

processor● Each object defines

attributes for that idle state● _CSD object for each

processor defines C state cross dependency

Page 16: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

CPU idle management overview● _CST and _CSD don’t scale well

to heterogenous architectures● Assume same number of power

states at each processor● Cant express Device power state

dependencies● Cant express power resource

dependencies● No notion of effect on caches at

each level of hierarchy● WIP to address shortcomings in

the spec● Plan to use existing governors +

PSCI methods

Page 17: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Device PM overview● Devices may define Dx states

○ D0 - ON○ D3 - OFF○ D1/D2 - possible intermediate states○ D3hot - Off (like D3) but may remain enumerable and context preserved.

● Platform specific details handled inside PSx control methods○ Called as needed by OSPM as the device transitions through Dx states

● Power Resources handled in PR objects○ Each PR supports: ON, OFF and STA (status) methods○ Devices have PRx lists which reference power resources as needed in Dx

states● 2 options to do device pm:

○ Manage power resources inside PSx. Called on entry to Dx state○ Declare PR separately with its own ON, OFF

■ Define device dependencies and let OSPM manage ON/OFF● Should not have to rely on clk/reg framework in Linux

Page 18: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Device PM state transitions● Device state transitions

1.Device wakeup (due to user request or interrupt)

a)If device depends on a power resource, must turn on all required power resources prior to enabling the device.

2.Keep alive if there are ongoing requests

3.Device inactive (no device requests for some time)

● Power Resources track all dependent devices (multiple devices may share the same power resource)

● Power Resource state transitions

A.All dependent devices are inactive (D3)

B.A dependent device is attempting wakeup

Page 19: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Device PM example

Page 20: HKG15-107: ACPI Power Management on ARM64 Servers (v2)

Recommended