Date post: | 11-Dec-2015 |
Category: |
Documents |
Upload: | bilal-giddens |
View: | 214 times |
Download: | 2 times |
NO POWER STRUGGLES:COORDINATED MULTI-LEVEL POWER MANAGEMENT FOR THE DATA CENTER
Ramya (UCSB), Parthasarathy et al (HP Labs)
Overview
Power delivery, consumption and cooling problems in a data center are being tackled currently by several systems that address “separate” aspects of these problems either locally/globally, in hardware/software.
When these systems are deployed simultaneously, the policies of one tends to interfere with the others
Overview…
The lack of coordination amongst such systems leads to undesirable consequences.
This paper proposes a “Global Power Management Solution” that coordinates these individual solutions.
Classifying the existing power management solutions..
Approach used: localized/distributed resource management, VMs
Power control : voltage scaling, power states, turning off machines
Implementation scope: server/cluster/data center level
Optimization requirements and constraints: accept performance loss?, allow power budget violation ?
In a nutshell..
“Tracking” problem – optimize power consumption while delivering performance.
“Capping” problem – Optimize power provisioning and cooling so as not to violate the power budget.
“Optimization” problem – maximize power saving while minimizing performance loss. (ACPIs, VMs, etc)
Representative Power Management Solutions Efficiency Controller (EC -tracking) –
optimize per server avg. power consumption. Adjusts ACPI P- states based on past resource usage to manage “estimated” future demand.
Server Manager (SM – capping) – Reduce P-state of a server on violation of Power budget.
Representative solutions.. Enclosure Manager (EM ) – thermal
power capping at blade level Group Manager (GM ) – at rack or data
center level These two monitor power usage on sets
of machines and re-provision power to maintain group power budget (determined manually or mandated by higher level power managers)
Representative solutions.. Virtual Machine Controller (VMC) –
reduce average power usage across a set of machines by workload consolidation, turning of idling machines, etc.
Power Struggles..
What happens if these solutions are deployed simultaneously ?
Power Struggles - examples EC and the SM both operate on the same
knob/actuator (P-state) but for different metrics. If uncoordinated, the EC can potentially overwrite the SM leading to power budget violations and eventual thermal failover! – A correctness issue.
Examples.. If the VMC and group cappers are
uncoordinated, the VMC can consolidate more capacity onto a collection of servers than allowed by the group power budget.
In addition to excessive performance violations (inefficiency), the VMC can potentially react to the lower utilization (because of power capping) and pack even more workloads onto the server, leading to a vicious cycle and system instability
Design Challenges of a Coordination System Interaction between different controllers
(EC, SM, EM, etc) must maintain “correctness, stability and efficiency”.
Global Awareness of the “presence” of other controllers while having minimal/zero knowledge of their properties.
Adaptability and Scalability – new controllers with same/different properties, new applications, etc.
Design Challenges - Sensitivity Issues. Overlapping functionalities and policies
of controllers – can they be mitigated ? Is the Coordinated Management System
agnostic to the deployed systems and applications (workloads) ?
The Design
The Design..
Use of feedback control loops.
Measure the required “metric”, compare with the “reference” value and manipulate the actuator based on the error so that the output follows the reference.
Details..
Diagram Efficiency Controller EC:
Reference utilization rref
Actual utilization ri
If ri < rref adjust Actuator A (P-State) ie reduce from say P0 to P4, resulting in higher utilization and lower power usage.
Details..
Diagram Server Manager SM:
Power Capping by measuring per server power consumption
If current consumption exceeds “power budget”, SM “INCREASES rref “ thereby allowing the EC to reduce the P-State of the machine
In effect, EC and SM use rref as communication channel.
Design..
EM & GM:Same principle as SM. Compare current
power usage against ref. power budget and assign new values to lower level servers ( EM ->SM, GM->EM) based on some policy (FIFO, random, etc).
The lower level servers pick the “minimum of upper level recommendation and their own local power budget”.
Design.. VMCs:
Use Actual utilization instead of “apparent” utilization (100% at P0 is not same as 100% at P3).
Supplied with data about approx power budget at various levels.
Also supplied with data about current power budget violations at various levels (through CIM)
The above three enable the VMCs to consolidate right workloads and making sure that the consolidated servers don’t violate the power budgets nor fall into the vicious cycle mentioned earlier.
Summary of changes to the controllers
Modeling the Controllers
Power – Performance Model – run actual workloads on hardware at different utilization levels and measure the power and performance.
Through curve-fitting of the simulation data, obtain linear models that represent the controller behavior.
Modeling..
EC - scaled up or down by λ (changes proportional to error in utilization).
r_ref is increased by SM in case of power budget violation cap_loc, resulting in EC lowering the power states of the machines.
Modeling.. SM: manipulates r_ref of EC if its power
budget violates cap_loc , subject to a cap determined by βloc factor.
EM & GM – operate on a fair share policy, power allocated to a component is proportional to power consumed in last interval
Modeling..
VMCs – Constrained Optimization Problem to map n VMs to m servers (decision variable matrix X).Include total power consumption and
migration overhead (αM ) in the calculation
Consider Server capacity constraints
Modeling VMCs..
Consider local, enclosure and group level power budget constraints
The level of consolidation is tuned by tuning the power budget buffers based on the violations at different levels.
Modeling VMCs..
Equations 1 to 6 depict a 0-1 integer optimization problem.
The authors use a greedy bin packing algorithm that yields an approximate optimal solution for the placement of VMs
Evaluation
How?Real time deployment in Data Center or a
full-system simulation ?○ Impractical, limits the set of use case
scenarios that can be studied due to the actual system being tested
Use of trace-driven simulation – use real world traces of enterprise deployments that would enable detailed workload modeling and evaluation of tradeoffs at policy and system levels. -?
Metrics used
Aggregate Power Saving, performance loss and power budget violation at SM, EM and GM levels.
No peak power saving is measured. No workload queuing i.e. if workload
exceeds capacity, there is performance loss due to power capping. No demand carry over.
Experimentation
180 workload traces (databases, web servers, remote desktops, e-commerce, etc).Create different types of mixes (real &
synthetic) from this set to exercise different utilization scenarios.
SUT – A low power Blade server A and an entry level 2U server B.
Experiment with different power budgets and also study the sensitivity of this architecture by varying the time constants.
Power – Performance models for Blade A and Server B
Results
Baseline: No power management
Results.. Base Results:
Coordinated – 64% reduction in power consumption, 3% performance degradation and 5% power budget violation
Uncoordinated – 12 % performance loss and 7% budget violation.
Sensitivity towards different Systems:Blade A - 5 p-states over higher power
rangeServer B - 6 p-states over low power range.Blade A’s absolute power saving > Server B.
○ Implies, “Range of Power control is more important than its granularity”
Results.. Variation for different workloads
At low utilization, VMC is major contributor to savings (assuming idle machines are “turned off”).
As utilization increases, benefits of VMC decrease while the combination of EC & VMC is better (i.e. a Coordinated Solution is better than a single one).
If idle m/c are not switched off, savings drop “significantly”!