Home >Documents >Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl...

Measuring and Managing Power Consumption · Measuring and Managing Power Consumption Todd Rosedahl...

Date post:24-Sep-2020
Category:
View:0 times
Download:0 times
Share this document with a friend
Transcript:
  • Measuring and Managing Power Consumption

    Todd Rosedahl

    IBM/POWER Firmware Development

  • • US datacenter energy consumption – 91Billion KWH • 34 500MW power plants • State of Minnesota – 68Billion KWH • Power Usage Effectiveness • Energy Star, SERT, Regulations

    Motivation

    2 11/3/2016 © 2015 OpenPOWER Foundation

  • • System Stack/Ecosystem • On Chip Controller (OCC) Overview • Hardware Block Diagram • Functional Details • Measurement • Future • References

    Agenda

    11/3/2016 3 © 2015 OpenPOWER Foundation

  • Cloud Software

    Operating System / KVM

    Standard Operating Environment

    (System Mgmt)

    Power Open Source Software Stack Components

    Existing Open Source

    Software Communities

    Firmware

    Hardware

    New OSS Community

    OpenPOWER Technology

    OpenPOWER Firmware

    July 2014 Power8 open source firmware stack contributed thru GitHub

    Toolkits and resources for porting and optimizing, growing repository on website

    AMESTER now Open Source

    OpenBMC now Open Source

    P8 HW

    KVM/Linux

    Host Boot SBE OCC

    Op

    en

    BM

    C

    System

    AMESTER ETH

    Ecosystem Enablement

    11/3/2016 4

    OPAL

    © 2015 OpenPOWER Foundation

  • http://openpowerfoundation.org/blogs/openpower-open-compute-rackspace-barreleye/

    OpenBMC has been released Rackspace -- Barreleye

    • 2 Power CPU Sockets

    • 192 HW Threads

    • 32 DIMM Slots

    • 32 TB DDR3

    • 15 Hot swap drives, 3 PCIe slots

    • Open Compute Compliant server

    Next generation OpenBMC stack

    • P9 POWER ready

    • Google/Rackspace (Zais)

    • IBM (Witherspoon)

    • Modern code stack: D-bus, journaling

    • Error logs with part numbers, location codes

    • Interface traces and simulation

    • OpenPOWER toolkit enabled

    • Extendable – add your own features/functions

    5 11/3/2016 © 2015 OpenPOWER Foundation

  • What is OCC? • Hardware/Firmware that controls system power, performance & thermals • 405 processor with 512k dedicated RAM • General Purpose Engines (GPE) to offload the 405

    What does OCC do? • Reads/controls system power • Reads/controls chip temperatures • Enables efficient fan control • Provides OT protection • Power Capping • Fault Tolerance • Energy saving • Performance boost • Enables sophisticated • measurement and profiling

    On Chip Controller (OCC) Overview

    11/3/2016 6 Physical paths not shown

    OCC Runs on

    GPE

    Processor

    405

    Register Reads/Writes

    Memory

    Power

    Measurement

    VRMs

    Measures/Actuates

    Fan Control Actuation

    Measures Actuates

    Opal Sys Mem

    Communicates

    BMC

    Loads

    C C

    Communicates

    HostBoot

    Communicates to other OCC via

    Loaded and initialized by

    Processor

    405 C C

    Reads

    Writes

    © 2015 OpenPOWER Foundation

  • Main Memory • Initial Load • Temperature Sensing • Utilization Measurement • Bandwidth control • OPAL communication • OCC communication

    Processor • Temperature Sensing • Core Frequency Control - PSTATES • Chip Voltage Control – PSTATES • Utilization Measurement

    BMC • Report power/temp • Provide Power Cap • DCMI compliance • Fan Control

    Hardware Block Diagram

    11/3/2016 7 © 2015 OpenPOWER Foundation

  • Power/Thermal Management

    11/3/2016 8

    Linux Governors – OCC measures/clips for capping/OT Balanced Performance Power save

    OCC Controls Maximum Frequency Power Capping Over Temperature Protection (Processors/Memory) Fault Tolerance (AC feed or power supply loss)

    Processor Idle States Nap Sleep Winkle

    Power Capping DCMI commands to set power limit via DCMI BMC can set N and N+1 system power caps.

    © 2015 OpenPOWER Foundation

  • BMC

    OCC Voting Box

    11/3/2016 9

    Frequency

    Voting Box

    (every 250us)

    Desired

    PSTATE

    from o/s

    Maximum

    PSTATE

    OCC Complex

    Actual

    PSTATE

    PMC

    (Hardware that

    actuates V/F slew)

    VRM

    Freq Set

    Lowest Pstate (freq) wins

    Compare temp

    to limit and

    set Thermal

    Control Vote

    (every 16ms) Core

    DTS

    Read every

    2ms, average

    all DTS to

    get core

    temp and

    take hottest

    core temp

    Read current 2ms hottest Core temp

    Read power

    every 250us

    Power

    measurement

    hardware

    Compare

    Power to limits

    and set Control

    Vote

    (every 250us)

    Read current 250us power reading

    Get Temps from OCC in Poll Response for export and for fan control

    Get power from OCC in poll response for export

    linux

    © 2015 OpenPOWER Foundation

  • Measurement – Out of Band

    11/3/2016 10

    Power/thermal

    measurement

    Hardware OCC

    CPU

    Host OS

    Node

    IB pathway

    Tighter workload coupling

    Performance concerns

    Jitter concerns

    OOB pathway

    Measure without affecting system

    Difficult to correlate to system jobs/events

    BMC

    Main

    Memory

    IPMI Sensors

    Inlet Temp

    Proc Core Temps

    Mem Temps

    All Power Rails

    DCMI Commands

    Node Power

    AMESTER -- IPMI OCC

    Pass through

    © 2015 OpenPOWER Foundation

  • 11/3/2016 11

    OCC

    Service

    Processor

    POWER8

    system AMESTER Internet

    Measurement

    Hardware

    P8 chip

    AMESTER Overview AMESTER – Automated MEasurement for System Temperature and Energy Reporting

    Uses management network and service processors • Does not interfere with operating system or workloads

    • Runs without requiring any additional installation on the server

    Sensor data collection - Power, Temperature, Performance measurement • All Power Rails, Core temps, Utilization, IPS, Bandwidth, Frequency, etc

    Continuous Graphing Mode

    Trace Buffer mode – output to file for import to Excel/Matlab • 250 µs tracing into 8KB buffer -- 1 sensor logged every 250usec for 1 second

    • 2 ms tracing into 8KB buffer – 1 sensor logged every 2ms for 8 seconds

    Parameter setting: Power/Thermal Trip points, Pin Frequency, etc

    11/3/2016 11 © 2015 OpenPOWER Foundation

  • Simple Example of Sensor Data

    11/3/2016 12 © 2015 OpenPOWER Foundation

  • Sensor Description

    11/3/2016 13 © 2015 OpenPOWER Foundation

  • Sample Sensor list

    11/3/2016 14

    440 Total Sensors – Power, Thermal, Performance, Utilization

    © 2015 OpenPOWER Foundation

  • Insights

    11/3/2016 15

    • Visualization is key to rapid prototyping and problem solving • Correlation of power consumption with other metrics • Time-alignment of sensor data is crucial for debugging firmware algorithms • Examples

    • Measuring settling time of power capping controller after workload changes • Developing dynamic voltage/frequency scaling algorithms • Discovering small non-steady behavior when steady-state behavior was expected

    © 2015 OpenPOWER Foundation

  • Future – P9

  • Hardware Block Diagram – P9

    17

    OCC (On Chip Controller) WOF(Workload Optimized frequency)

    Open BMC FSI to SBE, not I2C Move to RESTful interfaces

    Hardware Block Diagram – P9 Main Memory Initial Load Temperature Sensing Utilization Measurement OPAL communication

    New Sensors In-band AMESTER

    Processor Temperature Sensing Core Frequency Control Chip Voltage Control Utilization Measurement 24 cores (Quads) Instant on

    11/3/2016 © 2015 OpenPOWER Foundation 17

  • P9 OCC Complex with Quads/PPEs

    11/3/2016 18 © 2015 OpenPOWER Foundation

    • Hypervisor makes V/F requests per core

    • Frequency is per Quad (highest Core request)

    • IVRMs allow per Quad Voltage

    • External Voltage is per Chip

    • Temp sensors

    • 2 per core

    • 1 per cache

    • 3 per nest

    OCC Complex

    OCC (405)

    GPE GPE

    PGPE SGPE

    768KB SRAM

    P9 Chip

    AVSbus Interfaces

    DPLL Quad

    SMT4 Core Chiplet CME

    NCU

    10MB L3 SMT4 Core

    Chiplet

    L2

    SMT4 Core Chiplet

    CME NCU

    10MB L3

    SMT4 Core Chiplet

    L2

    DPLL Quad

    SMT4 Core Chiplet CME

    NCU

    10MB L3 SMT4 Core

    Chiplet

    L2

    SMT4 Core Chiplet

    CME NCU

    10MB L3

    SMT4 Core Chiplet

    L2

    SBE

    (Self-Boot Engine)

    96KB PIB

    MEM

    FastI2C

    32KB SRAM

    32KB SRAM

    32KB SRAM

    32KB SRAM

    IO PPE

    64KB SRAM

    DPLL Quad

    SMT4 Core Chiplet CME

    NCU

    10MB L3 SMT4 Core

    Chiplet

    L2

    SMT4 Core Chiplet

    CME NCU

    10MB L3

    SMT4 Core Chiplet

    L2

    DPLL Quad

    SMT4 Core Chiplet CME

    NCU

    10MB L3 SMT4 Core

    Chiplet

    L2

    SMT4 Core Chiplet

    CME NCU

    10MB L3

    SMT4 Core Chiplet

    L2

    32KB SRAM

    32KB SRAM

    32KB SRAM

    32KB SRAM

    DPLL Quad

    SMT4 Core Chiplet CME

    NCU

    10MB L3 SMT4 Core

    Chiplet

    L2

    SMT4 Core Chiplet

    CME NCU

    10MB L3

    SMT4 Core Chiplet

    L2

    DPLL Quad

    SMT4 Core Chiplet CME

    NCU

    10MB L3 SMT4 Core

    Chiplet

    L2

    SMT4 Core Chiplet

    CME NCU

    10MB L3

    SMT4 Core Chiplet

    L2

    32KB SRAM

    32KB SRAM

    32KB SRAM

    32KB SRAM

    IO PPE

    64KB SRAM

    IO PPE

    64KB SRAM

    PB PPE

    16KB SRAM

    Powerbus PPE

    Note: also used for NV-link management

  • Workload Optimized Frequency (WOF)

    11/3/2016 19 © 2015 OpenPOWER Foundation

    • Less active workloads can go to higher frequencies

    • Lower active core counts also increase top

    frequency potential

    • Actual PSTATE selected depends on linux Governor

    • As with Turbo, higher ambients will affect

    part to part determinism

    • Guaranteed no worse than existing turbo frequency

    • New OCC/OPAL interface

    • New Ultra Turbo Frequency point

    • Part to part determinism achieved by

    factoring out leakage current. Frequency

    Pow

    er

    Nominal Spec

    Turbo Spec

    Turb

    o F

    req

    Ult

    ra T

    urb

    o F

    req

  • Graphical Processing Units (GPUs)

    11/3/2016 20 © 2015 OpenPOWER Foundation

    What are these things (and why are they such a big deal)? • Roots in image rendering (what color should that pixel be?)

    • One operation on lots of data at the same time

    • Think weather simulation

    • Now used for AI/Machine learning

    • Key for speech/image recognition

    Challenges: • High Power

    • How to Maximize CPU/GPU performance

    • Data collection

  • Your logo here CPU/GPU Interface

    © 2015 OpenPOWER Foundation 11/3/2016 21

  • GPUs OCC

    CPU

    Host OS

    Node

    IB pathway

    Tighter workload coupling

    Performance concerns

    Jitter concerns

    New for P9 • Detailed power and

    performance data

    • In-Band AMESTER

    • GPU sensing and

    correlation

    Measurement – In Band vs Out of Band

    OOB pathway

    Measure without affecting system performance

    Difficult to correlate to system jobs/events

    BMC/FSP

    Main

    Memory

    Move to REST (Redfish)

    • IPMI/DCMI Inlet Temp

    Proc Core Temps

    Mem Temps

    All Power Rails

    Node Power

    • CIM (POWERVM) Buffers of

    power/thermal

    readings

    • Pass through Get anything

    (AMESTER)

    © 2015 OpenPOWER Foundation 22

  • Power/thermal

    measurement

    Hardware OCC

    CPU

    Host OS

    Node

    Measurement – In Band Sensors

    BMC/FSP

    Main

    Memory

    • 40+ different types of sensor readings

    • Power, performance, utilization

    • 400+ total sensors (24 cores scales this up)

    • Timestamped with the system timestamp

    • Accumulator and update tag support for energy calculations

    • Min/max support – and clearing of min/max

    • 4KB pushed up every 10ms

    • All sensors updated every 100ms

    • Read by the o/s, job profilers

    23

  • Your logo here Summary

    Server Power Consumption is a large part of TCO • Data Center Power consumption is going up

    • Power/Performance are linked

    • GPUs provide amazing performance benefits, but also power challenges

    OpenPOWER enables innovation and custom solutions • Full Firmware stack and tools are open

    • Host Boot, OPAL, OCC, BMC , AMESTER

    • Detailed power/performance data available

    24 11/3/2016 © 2015 OpenPOWER Foundation

  • Open Power Blog link:

    http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/

    GitHub pages:

    OpenPOWER project: https://github.com/open-power

    OCC: https://github.com/open-power/occ

    Building OpenPOWER: https://github.com/open-power/op-build

    Open BMC: https://github.com/openbmc/openbmc AMESTER: https://github.com/open-power/amester

    OCC Sensor collection document: https://github.com/open-power/docs/blob/master/occ/OCC_ipmitool_sensors.pdf

    References

    11/3/2016 25 © 2015 OpenPOWER Foundation

    http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/http://openpowerfoundation.org/press-releases/occ-firmware-code-is-now-open-source/https://github.com/open-powerhttps://github.com/open-powerhttps://github.com/open-powerhttps://github.com/open-powerhttps://github.com/open-power/occhttps://github.com/open-power/occhttps://github.com/open-power/occhttps://github.com/open-power/occhttps://github.com/open-power/occhttps://github.com/open-power/op-buildhttps://github.com/open-power/op-buildhttps://github.com/open-power/op-buildhttps://github.com/open-power/op-buildhttps://github.com/open-power/op-buildhttps://github.com/open-power/op-buildhttps://github.com/open-power/amesterhttps://github.com/open-power/amesterhttps://github.com/open-power/amesterhttps://github.com/open-power/amesterhttps://github.com/open-power/amesterhttps://github.com/open-power/docs/blob/master/occ/OCC_OpenPwr_FW_Interfaces.pdfhttps://github.com/open-power/docs/blob/master/occ/OCC_OpenPwr_FW_Interfaces.pdfhttps://github.com/open-power/docs/blob/master/occ/OCC_OpenPwr_FW_Interfaces.pdf
of 25/25
Measuring and Managing Power Consumption Todd Rosedahl IBM/POWER Firmware Development
Embed Size (px)
Recommended