+ All Categories
Transcript
Page 1: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Measuring and Managing Power Consumption

Todd Rosedahl

IBM/POWER Firmware Development

Page 2: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

• US datacenter energy consumption – 91Billion KWH • 34 500MW power plants • State of Minnesota – 68Billion KWH • Power Usage Effectiveness • Energy Star, SERT, Regulations

Motivation

2 11/3/2016 © 2015 OpenPOWER Foundation

Page 3: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

• System Stack/Ecosystem • On Chip Controller (OCC) Overview • Hardware Block Diagram • Functional Details • Measurement • Future • References

Agenda

11/3/2016 3 © 2015 OpenPOWER Foundation

Page 4: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Cloud Software

Operating System / KVM

Standard Operating Environment

(System Mgmt)

Power Open Source Software Stack Components

Existing Open Source

Software Communities

Firmware

Hardware

New OSS Community

OpenPOWER Technology

OpenPOWER Firmware

July 2014 Power8 open source firmware stack contributed thru GitHub

Toolkits and resources for porting and optimizing, growing repository on website

AMESTER now Open Source

OpenBMC now Open Source

P8 HW

KVM/Linux

Host Boot SBE OCC

Op

en

BM

C

System

AMESTER ETH

Ecosystem Enablement

11/3/2016 4

OPAL

© 2015 OpenPOWER Foundation

Page 5: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

http://openpowerfoundation.org/blogs/openpower-open-compute-rackspace-barreleye/

OpenBMC has been released Rackspace -- Barreleye

• 2 Power CPU Sockets

• 192 HW Threads

• 32 DIMM Slots

• 32 TB DDR3

• 15 Hot swap drives, 3 PCIe slots

• Open Compute Compliant server

Next generation OpenBMC stack

• P9 POWER ready

• Google/Rackspace (Zais)

• IBM (Witherspoon)

• Modern code stack: D-bus, journaling

• Error logs with part numbers, location codes

• Interface traces and simulation

• OpenPOWER toolkit enabled

• Extendable – add your own features/functions

5 11/3/2016 © 2015 OpenPOWER Foundation

Page 6: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

What is OCC? • Hardware/Firmware that controls system power, performance & thermals • 405 processor with 512k dedicated RAM • General Purpose Engines (GPE) to offload the 405

What does OCC do? • Reads/controls system power • Reads/controls chip temperatures • Enables efficient fan control • Provides OT protection • Power Capping • Fault Tolerance • Energy saving • Performance boost • Enables sophisticated • measurement and profiling

On Chip Controller (OCC) Overview

11/3/2016 6 Physical paths not shown

OCC Runs on

GPE

Processor

405

Register Reads/Writes

Memory

Power

Measurement

VRMs

Measures/Actuates

Fan Control Actuation

Measures Actuates

Opal Sys Mem

Communicates

BMC

Loads

C C

Communicates

HostBoot

Communicates to other OCC via

Loaded and initialized by

Processor

405 C C

Reads

Writes

© 2015 OpenPOWER Foundation

Page 7: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Main Memory • Initial Load • Temperature Sensing • Utilization Measurement • Bandwidth control • OPAL communication • OCC communication

Processor • Temperature Sensing • Core Frequency Control - PSTATES • Chip Voltage Control – PSTATES • Utilization Measurement

BMC • Report power/temp • Provide Power Cap • DCMI compliance • Fan Control

Hardware Block Diagram

11/3/2016 7 © 2015 OpenPOWER Foundation

Page 8: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Power/Thermal Management

11/3/2016 8

Linux Governors – OCC measures/clips for capping/OT Balanced Performance Power save

OCC Controls Maximum Frequency Power Capping Over Temperature Protection (Processors/Memory) Fault Tolerance (AC feed or power supply loss)

Processor Idle States Nap Sleep Winkle

Power Capping DCMI commands to set power limit via DCMI BMC can set N and N+1 system power caps.

© 2015 OpenPOWER Foundation

Page 9: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

BMC

OCC Voting Box

11/3/2016 9

Frequency

Voting Box

(every 250us)

Desired

PSTATE

from o/s

Maximum

PSTATE

OCC Complex

Actual

PSTATE

PMC

(Hardware that

actuates V/F slew)

VRM

Freq Set

Lowest Pstate (freq) wins

Compare temp

to limit and

set Thermal

Control Vote

(every 16ms) Core

DTS

Read every

2ms, average

all DTS to

get core

temp and

take hottest

core temp

Read current 2ms hottest Core temp

Read power

every 250us

Power

measurement

hardware

Compare

Power to limits

and set Control

Vote

(every 250us)

Read current 250us power reading

Get Temps from OCC in Poll Response for export and for fan control

Get power from OCC in poll response for export

linux

© 2015 OpenPOWER Foundation

Page 10: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Measurement – Out of Band

11/3/2016 10

Power/thermal

measurement

Hardware OCC

CPU

Host OS

Node

IB pathway

Tighter workload coupling

Performance concerns

Jitter concerns

OOB pathway

Measure without affecting system

Difficult to correlate to system jobs/events

BMC

Main

Memory

IPMI Sensors

Inlet Temp

Proc Core Temps

Mem Temps

All Power Rails

DCMI Commands

Node Power

AMESTER -- IPMI OCC

Pass through

© 2015 OpenPOWER Foundation

Page 11: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

11/3/2016 11

OCC

Service

Processor

POWER8

system AMESTER Internet

Measurement

Hardware

P8 chip

AMESTER Overview

AMESTER – Automated MEasurement for System Temperature and Energy Reporting

Uses management network and service processors • Does not interfere with operating system or workloads

• Runs without requiring any additional installation on the server

Sensor data collection - Power, Temperature, Performance measurement • All Power Rails, Core temps, Utilization, IPS, Bandwidth, Frequency, etc

Continuous Graphing Mode

Trace Buffer mode – output to file for import to Excel/Matlab • 250 µs tracing into 8KB buffer -- 1 sensor logged every 250usec for 1 second

• 2 ms tracing into 8KB buffer – 1 sensor logged every 2ms for 8 seconds

Parameter setting: Power/Thermal Trip points, Pin Frequency, etc

11/3/2016 11 © 2015 OpenPOWER Foundation

Page 12: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Simple Example of Sensor Data

11/3/2016 12 © 2015 OpenPOWER Foundation

Page 13: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Sensor Description

11/3/2016 13 © 2015 OpenPOWER Foundation

Page 14: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Sample Sensor list

11/3/2016 14

440 Total Sensors – Power, Thermal, Performance, Utilization

© 2015 OpenPOWER Foundation

Page 15: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Insights

11/3/2016 15

• Visualization is key to rapid prototyping and problem solving • Correlation of power consumption with other metrics • Time-alignment of sensor data is crucial for debugging firmware algorithms • Examples

• Measuring settling time of power capping controller after workload changes • Developing dynamic voltage/frequency scaling algorithms • Discovering small non-steady behavior when steady-state behavior was expected

© 2015 OpenPOWER Foundation

Page 16: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Future – P9

Page 17: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Hardware Block Diagram – P9

17

OCC (On Chip Controller) WOF(Workload Optimized frequency)

Open BMC FSI to SBE, not I2C Move to RESTful interfaces

Hardware Block Diagram – P9 Main Memory Initial Load Temperature Sensing Utilization Measurement OPAL communication

New Sensors In-band AMESTER

Processor Temperature Sensing Core Frequency Control Chip Voltage Control Utilization Measurement 24 cores (Quads) Instant on

11/3/2016 © 2015 OpenPOWER Foundation 17

Page 18: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

P9 OCC Complex with Quads/PPEs

11/3/2016 18 © 2015 OpenPOWER Foundation

• Hypervisor makes V/F requests per core

• Frequency is per Quad (highest Core request)

• IVRMs allow per Quad Voltage

• External Voltage is per Chip

• Temp sensors

• 2 per core

• 1 per cache

• 3 per nest

OCC Complex

OCC (405)

GPE GPE

PGPE SGPE

768KB SRAM

P9 Chip

AVSbus Interfaces

DPLL Quad

SMT4 Core Chiplet CME

NCU

10MB L3 SMT4 Core

Chiplet

L2

SMT4 Core Chiplet

CME NCU

10MB L3

SMT4 Core Chiplet

L2

DPLL Quad

SMT4 Core Chiplet CME

NCU

10MB L3 SMT4 Core

Chiplet

L2

SMT4 Core Chiplet

CME NCU

10MB L3

SMT4 Core Chiplet

L2

SBE

(Self-Boot Engine)

96KB PIB

MEM

FastI2C

32KB SRAM

32KB SRAM

32KB SRAM

32KB SRAM

IO PPE

64KB SRAM

DPLL Quad

SMT4 Core Chiplet CME

NCU

10MB L3 SMT4 Core

Chiplet

L2

SMT4 Core Chiplet

CME NCU

10MB L3

SMT4 Core Chiplet

L2

DPLL Quad

SMT4 Core Chiplet CME

NCU

10MB L3 SMT4 Core

Chiplet

L2

SMT4 Core Chiplet

CME NCU

10MB L3

SMT4 Core Chiplet

L2

32KB SRAM

32KB SRAM

32KB SRAM

32KB SRAM

DPLL Quad

SMT4 Core Chiplet CME

NCU

10MB L3 SMT4 Core

Chiplet

L2

SMT4 Core Chiplet

CME NCU

10MB L3

SMT4 Core Chiplet

L2

DPLL Quad

SMT4 Core Chiplet CME

NCU

10MB L3 SMT4 Core

Chiplet

L2

SMT4 Core Chiplet

CME NCU

10MB L3

SMT4 Core Chiplet

L2

32KB SRAM

32KB SRAM

32KB SRAM

32KB SRAM

IO PPE

64KB SRAM

IO PPE

64KB SRAM

PB PPE

16KB SRAM

Powerbus PPE

Note: also used for NV-link management

Page 19: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Workload Optimized Frequency (WOF)

11/3/2016 19 © 2015 OpenPOWER Foundation

• Less active workloads can go to higher frequencies

• Lower active core counts also increase top

frequency potential

• Actual PSTATE selected depends on linux Governor

• As with Turbo, higher ambients will affect

part to part determinism

• Guaranteed no worse than existing turbo frequency

• New OCC/OPAL interface

• New Ultra Turbo Frequency point

• Part to part determinism achieved by

factoring out leakage current. Frequency

Pow

er

Nominal Spec

Turbo Spec

Turb

o F

req

Ult

ra T

urb

o F

req

Page 20: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Graphical Processing Units (GPUs)

11/3/2016 20 © 2015 OpenPOWER Foundation

What are these things (and why are they such a big deal)?

• Roots in image rendering (what color should that pixel be?)

• One operation on lots of data at the same time

• Think weather simulation

• Now used for AI/Machine learning

• Key for speech/image recognition

Challenges: • High Power

• How to Maximize CPU/GPU performance

• Data collection

Page 21: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Your logo here CPU/GPU Interface

© 2015 OpenPOWER Foundation 11/3/2016 21

Page 22: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

GPUs OCC

CPU

Host OS

Node

IB pathway

Tighter workload coupling

Performance concerns

Jitter concerns

New for P9 • Detailed power and

performance data

• In-Band AMESTER

• GPU sensing and

correlation

Measurement – In Band vs Out of Band

OOB pathway

Measure without affecting system performance

Difficult to correlate to system jobs/events

BMC/FSP

Main

Memory

Move to REST (Redfish)

• IPMI/DCMI Inlet Temp

Proc Core Temps

Mem Temps

All Power Rails

Node Power

• CIM (POWERVM) Buffers of

power/thermal

readings

• Pass through Get anything

(AMESTER)

© 2015 OpenPOWER Foundation 22

Page 23: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Power/thermal

measurement

Hardware OCC

CPU

Host OS

Node

Measurement – In Band Sensors

BMC/FSP

Main

Memory

• 40+ different types of sensor readings

• Power, performance, utilization

• 400+ total sensors (24 cores scales this up)

• Timestamped with the system timestamp

• Accumulator and update tag support for energy calculations

• Min/max support – and clearing of min/max

• 4KB pushed up every 10ms

• All sensors updated every 100ms

• Read by the o/s, job profilers

23

Page 24: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Your logo here Summary

Server Power Consumption is a large part of TCO • Data Center Power consumption is going up

• Power/Performance are linked

• GPUs provide amazing performance benefits, but also power challenges

OpenPOWER enables innovation and custom solutions • Full Firmware stack and tools are open

• Host Boot, OPAL, OCC, BMC , AMESTER

• Detailed power/performance data available

24 11/3/2016 © 2015 OpenPOWER Foundation

Page 25: Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development ... July 2014 Power8 open source firmware stack contributed

Open Power Blog link:

http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/

GitHub pages:

OpenPOWER project: https://github.com/open-power

OCC: https://github.com/open-power/occ

Building OpenPOWER: https://github.com/open-power/op-build

Open BMC: https://github.com/openbmc/openbmc

AMESTER: https://github.com/open-power/amester

OCC Sensor collection document: https://github.com/open-power/docs/blob/master/occ/OCC_ipmitool_sensors.pdf

References

11/3/2016 25 © 2015 OpenPOWER Foundation


Top Related