11K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
Challenges in Distributed Challenges in Distributed Energy Adaptive ComputingEnergy Adaptive Computing
K. KantK. Kant
NSF and GMUNSF and GMU
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 22
Information & communication Technology (ICT) has a problem
Performance Centric Energy & Sustainability centric
How do we get there?
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 33
ICT Power Growth until 2020ICT Power Growth until 2020
• Increase in spite of power efficient designs– Clients: 8x in number, 3X in power– Data Centers: > 2X increase– Network: 3X increase
Network
Network
Clients
Data CenterTransmission, conversion& distribution
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 44
Current StateCurrent StateUnsustainable ComputingUnsustainable Computing
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 55
Data Center InfrastructureData Center Infrastructure
• Resource intensive: Water, cabling, metal, …• ~50% power wasted before getting to racks
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 66
13.2kv
11
5k
v
13
.2k
v
13.2kv 480V
20
8V
0.3% loss99.7% efficient
0.5% loss99.5% efficient
1.0% loss99.0% efficient
6% loss94% efficient
~1% loss in switchgear and conductorsUPS:
2.5MW Generator~180 Gallons/hour
IT LOAD
~10% distribution loss + High carbon impact
Distribution InfrastructureDistribution Infrastructure
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 77
~50% Rack Power Wasted~50% Rack Power Wasted
Component Total Used Comments
CPU 80 60 Operating at 100% utilization
Fans 50 25 Temp. directed fan at 100% util
Memory (32 GB) 88 24 2GB DIMMS, 4W idle, 19W active
Hard drives 40 10 6 SATA drives, 25% busy
I/O adapters 20 4 25% disk, 15% network
Motherboard 22 12 N/S bridges & devices, VR’s, …
Total DC power 300 135
Power supply loss 50 7 14% 5% loss of AC input pwr
AC input power 350 142 > 50% of power is wasted
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 88
Sustainable ComputingSustainable Computing
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 99
Renewable Energy PushRenewable Energy Push
• Limit energy draw from grid – Less infrastructure– Less losses– but variable supply
Need better power adaptabilityNeed better power adaptability
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1010
High Temperature DC’sHigh Temperature DC’s
• Chiller-less operation– Less energy/materials, but
space inefficient
• High temperature operation– Smaller Toutlet – Tinlet
– More throttling– More failure prone (?)
X
Need smarter thermal adaptabilityNeed smarter thermal adaptability
OverdesignOverdesign
• Overdesign is the norm today– Huge power supplies, fans, heat sinks, server cases,
high rack capacity, UPS capacity, …– Engineered for worst case Rarely encountered– Huge power wastage, waste of materials, energy, …
1111Better energy adaptability to deal w/ frugal Better energy adaptability to deal w/ frugal designdesign
Efficiency vs. Load
505560657075808590
0 20 40 60 80 100output load
PS
U e
ffic
ien
cy
Low eff High eff
• What if we right-size everything?• Highly energy
efficient but need smarter control
Energy Adaptive ComputingEnergy Adaptive Computing
• EAC strives to do dynamic end to end adjustment to – Workload adaptation for graceful QoS
degradation under energy limitations– Infrastructure adaptation to cope with
temporary energy deficiencies.
• Requires coordinated power/thermal mgmt of computation, network & storage.
• Enhances sustainability of IT infrastructure
1212
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1313
EAC InstancesEAC Instances
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1414
Client-server EACClient-server EAC
• Transparently adapt to client energy states– State = {on-AC, normal, low-battery, …}– Service contract Ci = {setup QoS, operational
QoS}
• Adaptation Challenges– Communicating & enforcing contracts.– Group adaptation of clients forced by
network/servers ?
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1515
Cluster EACCluster EAC
• Adaptation to intra & inter-DC limits – Multi-level: Server, rack & DC levels
• Adaptation Challenges– Estimate & collect power deficits/surplus at
multiple levels– Coordination across large range of devices
• Location based services• Coordination across levels
– Simultaneously handle client-server loop
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1616
P2P EACP2P EAC
• Adaptation based on “available energy”• Content: video resolution, audio coding, …• Network: modulate wireless radio usage (?)• Energy proportional use of peer resources• Energy driven content replication & reorganization
• Adaptation Challenges– Satisfying QoS ?– Balancing src/dest usage vs. relay node
energy usage ?
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1717
ChallengesChallenges
Some specific IssuesSome specific Issues
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1818
Power Estimation ChallengesPower Estimation Challenges
• Notion of effective power?– Additive relationship: Workload power – Why is this hard? Interference
• Available power– Determined by power, thermal & perhaps
other issues (noise).– Required at multiple levels: facility, enclosure,
machine, …
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1919
Network Role in EACNetwork Role in EAC
• Energy Adaptation– Aggressive control of switch/router ports
• Speed, state & width controls
– Traffic consolidation across paths
• Adaptation induced congestion– Propagation (e.g., ECN, EBCN) & response
• Computation – communication tradeoff ?
• Redirection ?
• Network protocol support for adaptation?
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 2020
Other IssuesOther Issues
• EAC Security– Attacks on power sources– Energy Attacks on IT, e.g.,
• Demanding too much, cyclic demands, …
• Storage adaptation– Storage devices, controllers & network.
• Coordinated end to end control is hard!
• Formal models to understand impact of energy adaptation.
Energy Adaptation in Energy Adaptation in Data CentersData Centers
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 2121
Adaptation MethodsAdaptation Methods
• Workload Adaptation– Coarse grain: Shut down low priority tasks– Fine grain: Graceful QoS degradation, e.g.,
• Batched service, poorer resolution, …
• Infrastructure Adaptation– Operation at lower speeds (DVFS)– Effective use of low power modes & “width”
control.
• Workload adaptation always done first2222
Infrastructure AdaptationInfrastructure Adaptation
• Need a multilevel scheme –– Individual “assets” up to entire data center
• Need both supply & demand side adaptations
Supply Side AdaptationSupply Side Adaptation
• Supply side Limits– Hard caps at higher levels (true limit) vs. “soft”
(artificial) caps at lower levels.– Limits may be a result of thermal/cooling issues.
• Load consolidation – An essential part of energy efficient operation– Load consolidation vs. soft capping
• Need to address workload adaptation changes as a result of supply increase & decrease.
Demand Side AdaptationDemand Side Adaptation
• Adaptation to fluctuating demand– Transactional workload: Migrate queries or
app VMs?
• Issues w/ combined supply & demand side adaptations– Imbalance: One node squeezed while other
has surplus power– Ping-pong Control: Oscillatory migration of
workload– Error accumulation down the hierarchy.
A Proposed AlgorithmA Proposed Algorithm
• Unidirectional control– Load migration moves up the hierarchy, from
local to global.– Local migrations are temporary & do not trigger
changes to “soft” caps on supply.• Target Node selection
– Based on bin packing (best-fit decreasing)– Allows for more imbalance, which can be
exploited for workload consolidation• Properties
– Avoids ping-pong, attempts to minimize imbalance
Experimental ResultsExperimental Results
• Scenario– 3 levels, 18 identical servers (4+4 + 5+5)– 3 applications, total of 25 app instances– Any app can run on any server – Demand Poisson (active power ∞ utilization)
Migration FrequencyMigration Frequency
• Migration drivers: consolidation vs. energy deficiency– Low util Consolidation, High util Energy deficiency
• Other characteristics– Migration frequency low in all cases – No ping-pong observed
Thermal ImpactsThermal Impacts
• Additional Issues– Energy consumption limited by
thermal/cooling issues, not energy availability– Migrations required to limit temperature
• Temperature & power have nonlinear relationship
• Need to account for both power & thermal effects
Results w/ Thermal EffectsResults w/ Thermal Effects
• Imbalanced cooling– Servers 1-14: Ta=25o C, Servers 15-18: Ta=40oC– Temperature limit: 65oC
• Power demand is adjusted by the alg. to account for higher temperature
ConclusionsConclusions
• Need to go beyond energy efficiency– Design devices/systems to minimize life-cycle
energy footprint– Creatively adapt to available energy to
operate “at the edge”
• Ongoing/future work– Coordinated server, network & storage mgmt.– Explore tradeoffs between QoS, power
savings and admission control performance
3131
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 3232
Thank you!Thank you!
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 3333
Power InefficienciesPower Inefficiencies
ServerPSU
Rack supply
70-90% efficient
±12, ±5V
VoltageRegulators
90-95% efficient
CPU
Wasted leakage & clock power
Fans
DRAM & Memcontroller
AdaptersStorage
280V
95% efficient Idle wasted power
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 3434
Operating RegimesOperating RegimesE
ner
gy
ple
nty
co
mp
uti
ng
Relative power requirements1.02.03.04.0 0
Per
form
ance
En
erg
y ad
apti
ve
com
pu
tin
g
En
erg
y d
efic
ien
t co
mp
uti
ng
En
erg
y ef
fici
ent
com
pu
tin
g
K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 3535
So, What’s the ProblemSo, What’s the Problem
• Local constraints & controls end-to-end impacts– DC to DC load shift
• Service disruption & post-shift impact
– Client request to alter content• Less or more work for server
• Potential conflicting controls
Client Client
Network
Network
Server1storage
DC1
Server2storageDC2
Core Network
Core Network