+ All Categories
Home > Documents > POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 ·...

POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 ·...

Date post: 23-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
38
POWER MANAGEMENT AND ENERGY EFFICIENCY 2017 Operating Systems Design Euiseong Seo ([email protected]) * Adopted “Power Management for Embedded Systems, Minsoo Ryu”
Transcript
Page 1: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

POWER MANAGEMENT AND ENERGY EFFICIENCY

2017 Operating Systems DesignEuiseong Seo ([email protected])

* Adopted “Power Management for Embedded Systems, Minsoo Ryu”

Page 2: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Need for Power Management

¨ Power consumption matters¨ PCs

¤ Energy cost¤ Thermal dissipation

¨ Mobile devices¤ Battery lifetime¤ Thermal dissipation

¨ Server systems¤ Energy cost¤ Electrical infrastructure¤ Power usage effectiveness

Page 3: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Power and Performance

¨ Power ∝ voltage2 ∗ clock¨ Clock ∝ voltage¨ Therefore, Power ∝ clock3

¨ Already server processors reached 150 Watt TDP

Page 4: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Power Consumers in a System

¨ Processors¤ Dominate power consumption¤ Usually consume 100 watts out of 300 watts

¨ Memory¤ Significant contributor

4Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 4Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

Power Consumption of Memory� In a server system

� Memory consumes 19% of system power on average� Some work notes up to 40% of total system power

Page 5: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Power Consumers in a System

¨ Storages¤ A server HDD consumes 5 to10 watts¤ A laptop HDD consumes 1 to 5 watts¤ A SATA SSD consumes 1 to 5 watts¤ An NVME consumes 30~ watts

¨ NIC¤ A 10Gbps NIC consumes 5 to 20 watts

¨ Peripherals¤ Insignificant

Page 6: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Idle Power Consumption

¨ Only 30% of servers in data centers are fully utilized while keeping the other 70% in idle state¤ Idle servers consume between 60% and 66% of the

peak load power consumption

5Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 5Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

Idle Power Consumption� Only 30% of servers in data centers are fully utilized

while keeping the other 70% in idle state� Idle servers consume between 60% and 66% of the peak

load power consumption

Page 7: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Two Dimensions on Power Management

¨ Power management when the system is idle ¤ Select the most efficient idle state

¨ Power management when the system is active ¤ Dynamically change operating frequency and/or

voltage

Page 8: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

APM and ACPI

¨ APM (Advanced Power Management)¤ Activated when system becomes idle

n Screen saver→ sleep → suspend

¤ Controlled by firmware (BIOS)n Need reboot for reconfiguration

¤ OS has no knowledge

¨ ACPI (Advanced Config. and Power Interfaces)¤ Controlled by OS¤ First released in 1996 by Compaq, HP, Intel and MS

Page 9: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

ACPI

¨ Standard interface specification¤ Brings power management under the control of the

operating system ¤ The specification is central to Operating System-

directed configuration and Power Management (OSPM)

8Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 8Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

ACPI� Standard interface specification

� Brings power management under the control of the operating system

� The specification is central to Operating System-directed configuration and Power Management (OSPM)

OS Power Management

Hardware: CPU, BIOS etc.

Software driversACPI

Applications

Page 10: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

ACPI Functions

¨ System power management ¤ The entire computer

¨ Processor power management ¤ When OS is idle but not sleeping, it puts processors in

low- power states

¨ Device power management¤ ACPI tables describe motherboard devices, their power

states, the power planes the devices are connected to

Page 11: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Firmware-Level ACPI Architecture

¨ Three components¤ ACPI tables

n Contain definition blocks that describe all the hardware that can be managed through ACPI

n Include both data and machine-independent byte-code n OS must have an interpreter for the AML bytecode

¤ ACPI BIOS n Performs basic management operations on the hardware n Include code to help boot the system and to put the system to

sleep or wake it up¤ ACPI registers

n A set of hardware management registers defined by the ACPI specification

Page 12: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Firmware-Level ACPI Architecture

11Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 11Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

Firmware-Level ACPI Architecture

Page 13: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

ACPI States

12Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 12Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

ACPI States

Page 14: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Global States

¨ G0: Working (S0) ¤ Processor power states (C-state): C0, C1, C2, C3

¨ G1: Sleeping (e.g., suspend, hibernate) ¤ Sleep State (S-state): S0, S1, S2, S3, S4

¨ G2: Soft off (S5) ¤ Almost the same as G3 Mechanical Off, except that the

power supply unit (PSU) still supplies power at a minimum ¤ Other components may remain powered so the computer

can "wake" on input from the keyboard, clock, modem, LAN, or USB device

¨ G3: Mechanical off

Page 15: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Processor States (C-State)

¨ Global state is G0 (working) ¨ Four processor states

¤ C0: Operating n Performance state (P-State) n P0: highest performance, highest power n P1 ~ Pn: lower performance, lower power

¤ C1: Haltn The processor is not executing instructions, but can return to an executing state

essentially instantaneously ¤ C2: Stop-Clock (optional)

n The processor maintains all software-visible state, but may take longer to wake up

¤ C3: Sleep (optional)n The processor does not need to keep its cache coherent, but maintains other

state

Page 16: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Processor States (C-State)

¨ Intel Pentium M at 1.6 Ghz

15Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 15Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

Performance States (P-State)

� Intel Pentium M at 1.6 GHz

Page 17: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Device States (D-State)

¨ The device states D0–D3 are device-dependent¤ D0: Fully On

n The operating state

¤ D1 and D2n Intermediate power-states whose definition varies by device

¤ D3: Offn The device is powered off and unresponsive to its busn D3 Hot: Aux power is providedn D3 Cold: No power provided

Page 18: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Sleeping States (S-State)

¨ Four sleeping states¤ S1: Power on Suspend (POS)

n All the processor caches are flushed n The power to the CPU(s) and RAM is maintained n Wakeup takes about 1 ~ 2 seconds on desktops

¤ S2: CPU powered offn Dirty cache is flushed to RAM (Often not used)

¤ S3: Suspend to RAM (STR), or Standby, Sleep n RAM remains powered n Wakeup takes about 3 ~ 5 seconds on desktops

¤ S4: Suspend to Disk (STD) or hibernationn All content of the main memory is saved to non-volatile memory

such as a hard drive, and is powered down

Page 19: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Dynamic Voltage and Frequency Scaling

¨ Adjusting clock speed and operating voltage dynamically

¨ Most modern processors provide¨ Low clock switching overhead¨ usually within a few µs.

Deadline

Time

Performance Deadline

Time

Performance

Page 20: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Four Considerations for DVFS

¨ Workload amount¤ Adjust the processor frequency depending on the load

¨ Workload characteristics ¤ Compute-intensive vs. memory-intensive

¨ Deadline constraints ¤ Lowest possible frequency for meeting deadlines

¨ Load balancing ¤ Migrate or scale?

Page 21: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Workload Amount and DVFS

¨ Static approaches¤ Performance policy

n CPU runs at the maximum frequency regardless of load¤ Power save policy

n CPU runs at the minimum frequency regardless of load¨ Dynamic approaches

¤ On demand policyn Increase the clock speed to the maximum frequency when the system

load goes above the predefined thresholdn Decrease the clock speed gradually when the system load becomes

below the predefined threshold¤ Conservative policy

n Gracefully increase the CPU speed rather than jumping to the maximum speed

Page 22: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Workload Characteristics and DVFS

¨ Two types of workload¤ Compute-intensive

n The program execution is exclusively bound to the processor ¤ Memory-intensive

n The program makes heavy access to memory n The processor would spend a significant fraction of the time

waiting for memory ¨ A simple solution

¤ High processor frequency and low memory frequency for compute-intensive load

¤ Low processor frequency and high memory frequency for memory-intensive load

Page 23: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

CPU VS Memory-Intensive

¨ Execution time variation ¤ CPU frequency ranging from 733 MHz to 333 MHz

25Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 25Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

CPU- vs. Memory-Intensive� Execution time variation

� CPU frequency ranging from 733 MHz to 333 MHz

Page 24: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

GPU and Memory-Intensive

¨ Compute-intensive applications¤ Dense matrix multiplication¤ Run on NVIDIA GeForce GTX 280 GPU

26Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 26Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

GPU- and Memory-Intensive� Compute-intensive applications

� Dense matrix multiplication� Run on NVIDIA GeForce GTX 280 GPU

Page 25: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

GPU and Memory-Intensive

¨ Memory-intensive applications¤ Dense matrix transpose¤ Run on NVIDIA GeForce GTX 280 GPU

27Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 27Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

GPU- and Memory-Intensive�Memory-intensive applications

� Dense matrix transpose� Run on NVIDIA GeForce GTX 280 GPU

Page 26: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Load Balancing and DVFS

¨ DVFS can be independently applied to each processor on multicore hardware ¤ But this may not lead to optimal power saving from a global

point of view ¨ A simple scenario

¤ We need to decide whether to transfer a thread from processor A to an idle processor B, or increase the frequency of A

¤ Compute Pmigrate_from_A_to_B and Pincrease_A_freq¤ Transfer if Pmigrate_from_A_to_B < Pincrease_A_freq¤ Otherwise, increase the frequency of A

Page 27: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Case Study: Intel Core2 Duo E685010

TABLE IIVOLTAGE (Vcpu (V)) AND CLOCK FREQUENCY ( fCPU (GHZ)) LEVELS FOR

INTEL CORE2 DUO E6850 PROCESSOR.

DVFS level Vcpu fcpu DVFS level Vcpu fcpuLevel 1 1.30 3.074 Level 4 1.15 2.281Level 2 1.25 2.852 Level 5 1.10 1.932Level 3 1.20 2.588 Level 6 1.05 1.540

TABLE IIIMEASURED AND ANALYTICAL MODELS OF INTEL CORE2 DUO E6850

POWER CONSUMPTION.

Vcpu(V ) fcpu(GHz) Measurement (W) Analyticalmodel (W)

1.056 1.776 21.520 21.2121.080 1.888 24.000 23.9561.104 2.004 26.320 26.8561.160 2.338 33.760 34.8381.224 2.672 43.200 44.4091.280 3.006 55.440 54.236

reflects the supply voltage and frequency changes. We choosea high-end DVFS-enabled microprocessor, i.e., Intel Core2Duo E6850 processor, along with the LTC3733 3-phase syn-chronous step-down DC–DC converter that supports discon-tinuous mode, which is a representative setup of a modernhigh-performance DVFS-enabled microprocessor.

The microprocessor power consumption model is describedin (18). The parameters Ce, a1, and a2 is obtained fromactual measurements. We insert a shunt monitor circuit rightin front of the DC–DC converter of the Intel Core2 DuoE6850 processor, and measure the power supply current withan Agilent A34401 digital multimeter. We compensate theDC–DC converter efficiency from the measured current values,and characterize IO. We run PrimeZ benchmark and changeVcpu and fcpu performing direct access to the BIOS (basicinput/output system) as described in Table II because the IntelSpeedStep supports only two voltage levels. We finally derivethe following power consumption model:

Pcpu = 8.4503V 2cpu fcpu +(36.3851Vcpu �33.9503), (30)

where the units of Pcpu, Vcpu, and fcpu are W, V, and GHz,respectively. The difference between the analytical model andmeasurement results is less than 4.6% as shown in Table III.The DC–DC converter parameters are given in Table IV. Thevalues are chosen according to guidelines in datasheet andreference designs offered by the vendor.

The delay overhead of DVFS transition is given in Table V.

TABLE IVDC–DC CONVERTER PARAMETERS OF LTC3733 3-PHASE CONVERTER

FOR INTEL CORE2 DUO E6850.

Parameter Value Parameter ValueVIN 12 (V) VOUT VO in Table IIC 8840 (µF) L 1 (µH) per phaseRL 2.3 (mW) fDC 530 (kHz) per phase

max(IL) 75 (A)

TABLE VDVFS TRANSITION DELAY OVERHEAD FOR INTEL CORE2 DUO E6850PROCESSOR WITH LTC3733 CONVERTER. (Epll =5 µs AND THE NUMBER

OF CYCLES ARE CALCULATED AT f = 3.074 GHz).

Level Actual value (µs) Proposed model (µs)Tuc Total Cycles Tuc Total Cycles

2!1 4.77 9.77 30018 4.11 9.11 280113!1 12.29 17.29 53141 12.21 17.21 528903!2 5.95 10.95 33672 5.72 10.72 329504!1 22.29 27.29 83894 24.24 29.24 898944!2 14.69 19.69 60531 16.21 21.21 652014!3 7.33 12.33 37921 8.06 13.06 401505!1 34.81 39.81 122389 40.37 45.37 1394575!2 26.47 31.47 96733 31.44 36.44 1120255!3 27.33 32.33 99383 21.87 26.87 826065!4 9.43 14.43 44361 11.49 16.49 506946!1 57.68 62.68 192684 60.90 65.90 2025906!2 49.50 54.50 167525 51.65 56.65 1741576!3 31.89 36.89 113409 41.52 46.52 1429946!4 28.35 33.35 102531 30.14 35.14 1080066!5 12.14 17.14 52688 16.84 21.84 67141

Downscale 0 5 15370 0 5 15370

The value of Tpll , 5 µs, is specified in the Intel Core2 DuoE6850 datasheet. The actual values are obtained from SPICEsimulation results. We obtain TX by observing the settling timeof VO(t) from SPICE results, and substitute it into equationsin Section IV to calculate the delay overhead. The estimatedoverhead from the proposed macro model well follows thetrend of actual values. For upscaling, the delay overhead issum of underclocking-related overhead Tuc and PLL locktime loss Tpll . Unlike assumption of previous works, theunderclocking-related overhead is the dominant factor for mostcases as we have discussed in Section III. For downscaling,PLL lock time is the only delay overhead, and thus theoverhead values are the same for all cases.

The energy overhead values of a DVFS transition forcontinuous- and discontinuous-mode operations are given inTables VI and VII, respectively. For the actual value, we obtainIL(t), IO(t), and VO(t) from SPICE simulation and substitutethem into (14), (16), and (19). There is no Ecap in the Tablesas it is implied in Eir. The value of Eir for the case 1 ! 6in Table VI is large because it drains significant amount ofcharge from bulk capacitor to the ground. On the other hand,Eir for the same case in Table VII is much smaller because ituses most of the stored charge to supply the load. This resultis very different from previous models such as [6] as theysimply calculate the overhead based on the charge transfer toand from the bulk capacitor.

B. Case 2: ARM Cortex-A8 ProcessorThe second target DVFS system is the ARM Cortex-A8

processor with LTC3446 converter. ARM Cortex-A8 processoris an application processor targeting high-end mobile productssuch as smartphones, tablets, and netbooks. It exhibits powerconsumption of 600 mW at full speed. We perform a procedure

10

TABLE IIVOLTAGE (Vcpu (V)) AND CLOCK FREQUENCY ( fCPU (GHZ)) LEVELS FOR

INTEL CORE2 DUO E6850 PROCESSOR.

DVFS level Vcpu fcpu DVFS level Vcpu fcpuLevel 1 1.30 3.074 Level 4 1.15 2.281Level 2 1.25 2.852 Level 5 1.10 1.932Level 3 1.20 2.588 Level 6 1.05 1.540

TABLE IIIMEASURED AND ANALYTICAL MODELS OF INTEL CORE2 DUO E6850

POWER CONSUMPTION.

Vcpu(V ) fcpu(GHz) Measurement (W) Analyticalmodel (W)

1.056 1.776 21.520 21.2121.080 1.888 24.000 23.9561.104 2.004 26.320 26.8561.160 2.338 33.760 34.8381.224 2.672 43.200 44.4091.280 3.006 55.440 54.236

reflects the supply voltage and frequency changes. We choosea high-end DVFS-enabled microprocessor, i.e., Intel Core2Duo E6850 processor, along with the LTC3733 3-phase syn-chronous step-down DC–DC converter that supports discon-tinuous mode, which is a representative setup of a modernhigh-performance DVFS-enabled microprocessor.

The microprocessor power consumption model is describedin (18). The parameters Ce, a1, and a2 is obtained fromactual measurements. We insert a shunt monitor circuit rightin front of the DC–DC converter of the Intel Core2 DuoE6850 processor, and measure the power supply current withan Agilent A34401 digital multimeter. We compensate theDC–DC converter efficiency from the measured current values,and characterize IO. We run PrimeZ benchmark and changeVcpu and fcpu performing direct access to the BIOS (basicinput/output system) as described in Table II because the IntelSpeedStep supports only two voltage levels. We finally derivethe following power consumption model:

Pcpu = 8.4503V 2cpu fcpu +(36.3851Vcpu �33.9503), (30)

where the units of Pcpu, Vcpu, and fcpu are W, V, and GHz,respectively. The difference between the analytical model andmeasurement results is less than 4.6% as shown in Table III.The DC–DC converter parameters are given in Table IV. Thevalues are chosen according to guidelines in datasheet andreference designs offered by the vendor.

The delay overhead of DVFS transition is given in Table V.

TABLE IVDC–DC CONVERTER PARAMETERS OF LTC3733 3-PHASE CONVERTER

FOR INTEL CORE2 DUO E6850.

Parameter Value Parameter ValueVIN 12 (V) VOUT VO in Table IIC 8840 (µF) L 1 (µH) per phaseRL 2.3 (mW) fDC 530 (kHz) per phase

max(IL) 75 (A)

TABLE VDVFS TRANSITION DELAY OVERHEAD FOR INTEL CORE2 DUO E6850PROCESSOR WITH LTC3733 CONVERTER. (Epll =5 µs AND THE NUMBER

OF CYCLES ARE CALCULATED AT f = 3.074 GHz).

Level Actual value (µs) Proposed model (µs)Tuc Total Cycles Tuc Total Cycles

2!1 4.77 9.77 30018 4.11 9.11 280113!1 12.29 17.29 53141 12.21 17.21 528903!2 5.95 10.95 33672 5.72 10.72 329504!1 22.29 27.29 83894 24.24 29.24 898944!2 14.69 19.69 60531 16.21 21.21 652014!3 7.33 12.33 37921 8.06 13.06 401505!1 34.81 39.81 122389 40.37 45.37 1394575!2 26.47 31.47 96733 31.44 36.44 1120255!3 27.33 32.33 99383 21.87 26.87 826065!4 9.43 14.43 44361 11.49 16.49 506946!1 57.68 62.68 192684 60.90 65.90 2025906!2 49.50 54.50 167525 51.65 56.65 1741576!3 31.89 36.89 113409 41.52 46.52 1429946!4 28.35 33.35 102531 30.14 35.14 1080066!5 12.14 17.14 52688 16.84 21.84 67141

Downscale 0 5 15370 0 5 15370

The value of Tpll , 5 µs, is specified in the Intel Core2 DuoE6850 datasheet. The actual values are obtained from SPICEsimulation results. We obtain TX by observing the settling timeof VO(t) from SPICE results, and substitute it into equationsin Section IV to calculate the delay overhead. The estimatedoverhead from the proposed macro model well follows thetrend of actual values. For upscaling, the delay overhead issum of underclocking-related overhead Tuc and PLL locktime loss Tpll . Unlike assumption of previous works, theunderclocking-related overhead is the dominant factor for mostcases as we have discussed in Section III. For downscaling,PLL lock time is the only delay overhead, and thus theoverhead values are the same for all cases.

The energy overhead values of a DVFS transition forcontinuous- and discontinuous-mode operations are given inTables VI and VII, respectively. For the actual value, we obtainIL(t), IO(t), and VO(t) from SPICE simulation and substitutethem into (14), (16), and (19). There is no Ecap in the Tablesas it is implied in Eir. The value of Eir for the case 1 ! 6in Table VI is large because it drains significant amount ofcharge from bulk capacitor to the ground. On the other hand,Eir for the same case in Table VII is much smaller because ituses most of the stored charge to supply the load. This resultis very different from previous models such as [6] as theysimply calculate the overhead based on the charge transfer toand from the bulk capacitor.

B. Case 2: ARM Cortex-A8 ProcessorThe second target DVFS system is the ARM Cortex-A8

processor with LTC3446 converter. ARM Cortex-A8 processoris an application processor targeting high-end mobile productssuch as smartphones, tablets, and netbooks. It exhibits powerconsumption of 600 mW at full speed. We perform a procedure

10

TABLE IIVOLTAGE (Vcpu (V)) AND CLOCK FREQUENCY ( fCPU (GHZ)) LEVELS FOR

INTEL CORE2 DUO E6850 PROCESSOR.

DVFS level Vcpu fcpu DVFS level Vcpu fcpuLevel 1 1.30 3.074 Level 4 1.15 2.281Level 2 1.25 2.852 Level 5 1.10 1.932Level 3 1.20 2.588 Level 6 1.05 1.540

TABLE IIIMEASURED AND ANALYTICAL MODELS OF INTEL CORE2 DUO E6850

POWER CONSUMPTION.

Vcpu(V ) fcpu(GHz) Measurement (W) Analyticalmodel (W)

1.056 1.776 21.520 21.2121.080 1.888 24.000 23.9561.104 2.004 26.320 26.8561.160 2.338 33.760 34.8381.224 2.672 43.200 44.4091.280 3.006 55.440 54.236

reflects the supply voltage and frequency changes. We choosea high-end DVFS-enabled microprocessor, i.e., Intel Core2Duo E6850 processor, along with the LTC3733 3-phase syn-chronous step-down DC–DC converter that supports discon-tinuous mode, which is a representative setup of a modernhigh-performance DVFS-enabled microprocessor.

The microprocessor power consumption model is describedin (18). The parameters Ce, a1, and a2 is obtained fromactual measurements. We insert a shunt monitor circuit rightin front of the DC–DC converter of the Intel Core2 DuoE6850 processor, and measure the power supply current withan Agilent A34401 digital multimeter. We compensate theDC–DC converter efficiency from the measured current values,and characterize IO. We run PrimeZ benchmark and changeVcpu and fcpu performing direct access to the BIOS (basicinput/output system) as described in Table II because the IntelSpeedStep supports only two voltage levels. We finally derivethe following power consumption model:

Pcpu = 8.4503V 2cpu fcpu +(36.3851Vcpu �33.9503), (30)

where the units of Pcpu, Vcpu, and fcpu are W, V, and GHz,respectively. The difference between the analytical model andmeasurement results is less than 4.6% as shown in Table III.The DC–DC converter parameters are given in Table IV. Thevalues are chosen according to guidelines in datasheet andreference designs offered by the vendor.

The delay overhead of DVFS transition is given in Table V.

TABLE IVDC–DC CONVERTER PARAMETERS OF LTC3733 3-PHASE CONVERTER

FOR INTEL CORE2 DUO E6850.

Parameter Value Parameter ValueVIN 12 (V) VOUT VO in Table IIC 8840 (µF) L 1 (µH) per phaseRL 2.3 (mW) fDC 530 (kHz) per phase

max(IL) 75 (A)

TABLE VDVFS TRANSITION DELAY OVERHEAD FOR INTEL CORE2 DUO E6850PROCESSOR WITH LTC3733 CONVERTER. (Epll =5 µs AND THE NUMBER

OF CYCLES ARE CALCULATED AT f = 3.074 GHz).

Level Actual value (µs) Proposed model (µs)Tuc Total Cycles Tuc Total Cycles

2!1 4.77 9.77 30018 4.11 9.11 280113!1 12.29 17.29 53141 12.21 17.21 528903!2 5.95 10.95 33672 5.72 10.72 329504!1 22.29 27.29 83894 24.24 29.24 898944!2 14.69 19.69 60531 16.21 21.21 652014!3 7.33 12.33 37921 8.06 13.06 401505!1 34.81 39.81 122389 40.37 45.37 1394575!2 26.47 31.47 96733 31.44 36.44 1120255!3 27.33 32.33 99383 21.87 26.87 826065!4 9.43 14.43 44361 11.49 16.49 506946!1 57.68 62.68 192684 60.90 65.90 2025906!2 49.50 54.50 167525 51.65 56.65 1741576!3 31.89 36.89 113409 41.52 46.52 1429946!4 28.35 33.35 102531 30.14 35.14 1080066!5 12.14 17.14 52688 16.84 21.84 67141

Downscale 0 5 15370 0 5 15370

The value of Tpll , 5 µs, is specified in the Intel Core2 DuoE6850 datasheet. The actual values are obtained from SPICEsimulation results. We obtain TX by observing the settling timeof VO(t) from SPICE results, and substitute it into equationsin Section IV to calculate the delay overhead. The estimatedoverhead from the proposed macro model well follows thetrend of actual values. For upscaling, the delay overhead issum of underclocking-related overhead Tuc and PLL locktime loss Tpll . Unlike assumption of previous works, theunderclocking-related overhead is the dominant factor for mostcases as we have discussed in Section III. For downscaling,PLL lock time is the only delay overhead, and thus theoverhead values are the same for all cases.

The energy overhead values of a DVFS transition forcontinuous- and discontinuous-mode operations are given inTables VI and VII, respectively. For the actual value, we obtainIL(t), IO(t), and VO(t) from SPICE simulation and substitutethem into (14), (16), and (19). There is no Ecap in the Tablesas it is implied in Eir. The value of Eir for the case 1 ! 6in Table VI is large because it drains significant amount ofcharge from bulk capacitor to the ground. On the other hand,Eir for the same case in Table VII is much smaller because ituses most of the stored charge to supply the load. This resultis very different from previous models such as [6] as theysimply calculate the overhead based on the charge transfer toand from the bulk capacitor.

B. Case 2: ARM Cortex-A8 ProcessorThe second target DVFS system is the ARM Cortex-A8

processor with LTC3446 converter. ARM Cortex-A8 processoris an application processor targeting high-end mobile productssuch as smartphones, tablets, and netbooks. It exhibits powerconsumption of 600 mW at full speed. We perform a procedure

Page 28: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Case Study: Exynos 4210

31Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 31Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

DVFS Overhead (Exynos 4210)

Page 29: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Linux Power Management Architecture

33Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 33Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

Linux Power Management Architecture

Amit Kucheria at 2011 Embedded Linux Conference

Policy ManagementLayer (Governors)

Device DriverLayer

Page 30: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

CPUidle Architecture

34Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 34Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

CPUidle Architecture

Page 31: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

CPUidle Governors

¨ Ladder Governor ¤ Takes a simple, step-wise approach to selecting an idle

state ¤ Enters the lightest state first, and will only move on to

the next deeper state if a sleep was long enough

¨ Menu Governor ¤ Picks the deepest possible idle state straight away ¤ Considers the expected sleep time, latency

requirements, previous C-state residency, etc

Page 32: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Idle Task

¨ When there are no runnable processes, and CFS schedules the idle task (PID 0)

36Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 36Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

Idle Task�When there are no runnable processes, and CFS

schedules the idle task (PID 0)

Page 33: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Tickless Idle

¨ Traditional systems use a periodic interrupt 'tick’¤ Update the system clock¤ Tick requires wakeup from idle state

¨ Tickless idle eliminates the periodic timer tick when the CPU is idle¤ The CPU can remain in power saving states for a longer

period of time, reducing the overall system power consumption

Page 34: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

CPUFreq Architecture

40Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 40Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

CPUfreq Architecture

Page 35: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

CPUFreq Governor

41Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 41Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

CPUfreq Governors

Governor Operations

Performance • Always set CPU to the highest frequency between scaling_min_freq and scaling_max_freq

Powersave • Always set CPU to the lowest frequency between scaling_min_freq and scaling_max_freq

Ondemand• Set frequency depending on the current usage

• Rapidly increase the frequency and gracefully decrease the frequency

Conservative• Basically operates like ondemand

• Gracefully increase and decrease the frequency

Userspace• Set CPU to the frequency using

scaling_setspeed by user

Page 36: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Ondemand Governor

42Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 42Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

Ondemand Governor

min

max

up_threshold = 95%Load

Time

Freq.

Page 37: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Conservative Governor

43Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 43Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

Conservative Governor

min

max

up_threshold = 80%down_threshold = 20%

Time

Freq.

Load

Page 38: POWER MANAGEMENT AND ENERGY EFFICIENCYcsl.skku.edu/uploads/ECE5658S17/week5.pdf · 2017-03-31 · Firmware-Level ACPI Architecture ¨Three components ¤ACPI tables nContain definition

Basic Operations of CPUfreq

¨ Sample the processor utilization periodically¨ Adjust frequency based on the utilization¨ Adjust voltage based on frequency

44Real-Time Computing and Communications Lab., Hanyang University

http://rtcc.hanyang.ac.kr 44Real-Time Computing and Communications Lab., Hanyang Universityhttp://rtcc.hanyang.ac.kr

Basic Operations of CPUfreq� Sample the processor utilization periodically� Adjust frequency based on the utilization� Adjust voltage based on frequency


Recommended