1IBM November 5,2012
Storage Advanced Management Controller for
Energy Efficiency (SAMCEE – a Part of the GAMES EU
Project)
Contributed:
Ronen Kat, Doron Chen, George Goldberg, Dmitry Sotnikov
Ealan HenisNovember 5, 2012
2IBM November 5,2012
AgendaAgenda
Background and Motivation Formal Problem Statement Technical Challenges Technical Solution Details Experimental data Results and Publications Conclusions
3IBM November 5,2012
Background and Motivation
Energy efficiency becomes important, not only performance, hence the EU funded project
Partners: ENG, POLIMI, TUC, HLRS, Christmann, ENERGOECO, ENELSi
Need energy aware management of storage Scope is the datacenter Most savings in storage may be attained by
shutting down storage components However, the MAID approach is not acceptable
◦ Performance (delays caused by spinning up devices)◦ Data safety (data loss when trying to spin up)◦ Need a different approach
4IBM November 5,2012
Ener
gy E
ffici
ency
As
sess
men
t too
l
IT Infrastructure
Energy Practice Knowledge Base
Migration Controller
ESMI intf-Provenance
•BP annotation•GPI evaluation•Critical components•Enabled actions•Notifications•Controllers rules•FilesForAppication
•Performed actions•Results•Migration plan
GAMES-enabled Application metadata
Data Storage
•Requirements modifications, e.g., thresholds, data dependencies•BP annotation
Dat
a M
inin
g
•Raw data
•Patterns (e.g., file usage, file dependencies)
•GPIs, KPIs•System Usage info
Acoustic Mode Controller
Aggregated dBAggregated dB
Server data monitoringServer data monitoring
Server storage performance monitoring
Server storage performance monitoring
GAMES DASHBOARD-Editing
-- initialization-Migration plan execution
- Semiautomatic Scheduling-GPI mgmt
GAMES DASHBOARD-Editing
-- initialization-Migration plan execution
- Semiautomatic Scheduling-GPI mgmt
Violations
5IBM November 5,2012
Formal Problem Statement
Goal◦ Reduce energy consumption in the storage system
Means◦ Energy aware storage management◦ Via control of data placement and disk (acoustic) modes
Input data for control obtained from◦ Dedicated storage monitoring mechanisms◦ GAMES ESMI and databases◦ Application level annotations, application-to-file mapping
6IBM November 5,2012
Technical Challenges
How do we save energy yet observe performance What are the guiding principles for controller
design. E.g.,◦ Client/server architecture◦ Separate control of data placement and disk acoustic modes◦ Spin down disks, but not ones that are used by any application
What are the control policies. E.g.,◦ Ranking based data placement◦ Usage centric efficiency metrics◦ File splitting into smaller chunks for finer granularity
7IBM November 5,2012
Technical Solution Details Device ranking based on usage centric metrics
◦ Space/power, IOPS/power, MBPS/Power◦ Similar to potential values in form, but actual usage is the
focus Placement policy based on ranking (select high ranks) Files are split into chunks, each chunk is placed separately Round robin chunk placement (top ranks) for performance Storage tiers used implicitly (via ranking) Separate control of data placement and disk acoustic
modes Separate spin up/down controller, of unused disks
◦ keep spares ready for intact performance NFS NAS server with standard NFS client access Online new chunk allocation and modes Offline chunk migration (usage statistics available) Fuzzy control for modes, rules are based on previous study
8IBM November 5,2012
Architecture – Client Side View
Application1 Application2
Client Host
annotate file1 annotate file2
annotate file1annotate file2
Client Host File System
NFS Client
NAS Server
NFS Server
File Mapper
Migration and Disk Mode Controller
Backend Storage
See server details in next slide
Data Mover
Access file1
Access file2
9IBM November 5,2012
SAMCEE NAS - Server Side View
NFS Server
File Mapper/splitter
Placement Controller
Data MoverExecute migration plan
Disk mode Controller
Backend Storage
file1a Tier1 Disk3file1b Tier1 Disk5
File1 split into file1a, file1b
file1a Tier1 Disk3file1b Tier3 Disk2
Tier1 SSD
Tier2 SAS
Tier3 SATA
Set mode
Monitored storage
Annotations
Annotations
Current map
Desired map
File
acc
ess
10IBM November 5,2012
List of SAMCEE Components Basic level (5 modules)
◦ Linux kernel extension ◦ Three Service agents: performance and power low level data
collection and store into mysql DB◦ Fuse user level file splitter and new chunks placement◦ Nagios storage system reporting
Mid Level (5 modules)◦ Application to file mapping, annotations and EPKB access◦ Device power modeling◦ Application power modeling and reporting◦ Events handling: metrics updates and migration request◦ Spin up/down of device scripts (system dependent)
High level (4 modules)◦ Device ranking based on usage centric efficiency metrics◦ Spin up/down controller and logic of spare disks◦ Disk acoustic mode control logic (fuzzy) and actuation◦ Migration planning and execution
11IBM November 5,2012
Integration with GAMES Components
SAMCEE receiving (via PTIEvent)◦ Change usage centric metrics weights◦ Enable/disable disk acoustic mode control◦ Migration requests◦ Application annotations, app-to-file mapping◦ Load configuration file from EPKB
SAMCEE sending ◦ System and application power via NAGIOS NRPE plug-in◦ Critical conditions PTIEvent
12IBM November 5,2012
Potential Savings Estimates Energy savings depend on the
performance/runtime effects Performance hit will be minor via NFS Savings estimated at 50% of power difference
between all disks down and all disks up IBM testbed
◦ 120 vs. 150 10% w/o SAMCEE server◦ 360 vs. 390 4% with SAMCEE server
HLRS testbed◦ 168 vs. 226 13% with SAMCEE server
ENG testbed◦ 350 vs. 410 7% w/o SAMCEE controller◦ 600 vs. 660 4% with SAMCEE controller◦ No disk mode control available◦ Spinning off of 3 disk pairs out of 6 also limits energy savings
13IBM November 5,2012
Storage Reference Platform FUSE was chosen for simplicity of implementation In a production system, the file splitter
implementation should be a kernel implementation
GAMES performance and energy benchmarks Should use NFS with FUSE storage and compare:◦ SAMCEE enabled vs. SAMCEE disabled◦ We are testing SAMCEE’s control not FUSE efficiency
14IBM November 5,2012
Disk Acoustic Mode Control
Fuzzy Inference System (FIS) interpolates well between extreme cases with known solutions
Inputs to the disk mode controller are◦ Portion of disk sequential accesses, disk queue length, IO
read/write response times, disk average capacity used, disk average usage - IOPS and Throughput
Extreme cases are based on lessons from previous investigation of disks acoustic modes
Examples of mode selection policy◦ For sequential IOs prefer Normal Mode, else (random IOs)◦ Small or Medium IOPS and RT prefer Quiet Mode◦ High IOPS and medium or high RT prefer Normal mode
Modes control in SAMCEE is repeatedly activated every 10 seconds
15IBM November 5,2012
Easy Changes to FIS via XML init File
<FuzzyRules> <Rule> <RuleName>zeroRule</RuleName> <SimpleAntecedent> <InputName> DiskSequentialAccessRatio </InputName> <LinguisticVar> Non-sequential </LinguisticVar> </SimpleAntecedent> <SimpleAntecedent> <InputName> DiskIOPSRate </InputName> <LinguisticVar> Small </LinguisticVar> </SimpleAntecedent> <SimpleAntecedent> <InputName> DiskResponseTime </InputName> <LinguisticVar> Low </LinguisticVar> </SimpleAntecedent> <SimpleAntecedent> <InputName> DiskInMode </InputName> <LinguisticVar> NormalInMode </LinguisticVar> </SimpleAntecedent> <OutputLinguisticVar> DiskQuietMode </OutputLinguisticVar> </Rule>
16IBM November 5,2012
Chunk Placement Control Based on storage device ranking
◦ Ranks calculated based on usage centric energy efficiency metrics
Need to consider actual (usage) energy efficiency for the standard metrics capacity, I/O and TP per Watt energy efficiency metrics.
Usage metrics directly linked to user applications◦ Derived on the basis of dynamically collected storage usage and
power statistics Device ranks are periodically recalculated Using a weighted sum of the 3 metrics Storage tiers are used implicitly through ranks Data consolidation automatically obtained by
high weight for capacity metric Device performance is treated as a constraint Round robin size for top ranking -> performance
17IBM November 5,2012
Demo and Experimental data
10 minutes demo of basic SAMCEE operations Disk acoustic mode control Automatic Spin up and down of unused / used disk Followed by experimental data from 3 testbeds
18IBM November 5,2012
IBM Testbed Disk Modes Data For 5000 IOs, 16 threads, requested IO rate above 150 IOPS
For 5000 IOs, 16 threads, requested IO rate of 80 IOPS◦
SAMCEE automatically selects the optimal acoustic mode◦ Heuristic algorithm and fuzzy inference system◦ Preliminary tests use stable unvarying mini-benchmarks◦ Best mode was selected for the entire run◦ For improving test accuracy same workload on all disks
Mode Runtime(Sec+-1)
Power(W+-2)
Energy (Joul+-50)
Normal (-52%) 43 192 8256
Quiet 99 175 17325
Mode Runtime(Sec+-1)
Power(W+-2)
Energy (Joul+-50)
Normal 62 196 12152
Quiet (-11%) 62 174 10788
19IBM November 5,2012
IBM Testbed Spin Down Data Synthetically generated micro-benchmarks run of the HRL testbed
◦ Uniform fixed work runs (preset total amount of IOs, concurrency and IO rate)
Energy consumption data (1 JBOD, 8 SATA disks)◦ Measured JBOD Power values (in Watts +- 2) for
◦ Measured JBOD Power values (in Watts +- 2) at idle with
◦ Each idle disk contributes approximately 10 W Additional 2.5/5 W (for 150 random IOPS load in Quiet/Normal acoustic modes)
◦ Depending on the characteristics of the workload, energy may be saved by selecting an appropriate Q/N mode Depends on the power saved vs. the runtime prolongation ratio
SpinDown Idle MaxLoad(Q) MaxLoad(N)
121 152 172 192
0 disks 2 disks 4 disks 6 disks 8 disks
70 90-100 116 134 152
20IBM November 5,2012
Load per Disk is Low for RR=8
Without Samcee RR_Size=8
146
148150
152
154
156158
160
162
Time(min)
JBOD Power (W)
Average power=157 W
Runtime 120 min
Energy=157 * 120 = 18840 Jouls*60
Locally copy 200GB of 10MB files, 20 concurrent threads
21IBM November 5,2012
Load per Disk is High RR=1
SAMCEE RR_Size=1
105
110
115
120
125
130
135
140
Time(min)
JBOD Power (W)
Average power=130
Runtime = 170 min
Energy=130*170=22100 Jouls*60
Note the spin-up/down effects
Note the prolonged runtime
Locally copy 200GB of 10MB files, 20 concurrent threads
22IBM November 5,2012
Load with RR=3 Allows Optimization
Without Samcee RR_Size=3
150152154156158160162164
Time(min)
JBOD Power (W)
Average power=158
Runtime = 100 min (shorter than for RR=8)
Energy=158*100=15800 Jouls*60 (lower than RR=8)
Locally copy 200GB of 10MB files, 20 concurrent threads
23IBM November 5,2012
Load with RR=3 Allows Optimization
SAMCEE with RR_Size=3
0
2040
60
80
100120
140
160
Time(min)
JBOD Power (W)
Average power=137 (-13%, -5% with server 240W)
Runtime = 100 min (similar runtime as w/o SAMCEE)
Energy=137*100=13770 Jouls*60 (-13%, -5%)
Locally copy 200GB of 10MB files 20 concurrent threads
24IBM November 5,2012
IBM Testbed Migration Results
Used disks before migration (data can be squeezed to 2)
Power before(including 2 spare disks)
Power after (R=2)
Savings(%)
>=14 314 228 27.4
12 301 228 24.2
10 288 228 20.8
8 272 228 16.2
6 267 228 14.6
4 242 228 5.8
2 228 228 0%
25IBM November 5,2012
HLRS Testbed Results
HPC Simulation (1), storage power only
Runtimemin
PowerW
controller SAMCEE
EnergyKJ
savings
128 231.1 no no 1775 reference
125 213.7 no yes 1603 9.7%
122 229.9 fuzzy no 1683 5.2%
117 210.2 bio no 1476 16.8%
132.8 209.8 fuzzy yes 1672 5.8%
130.6 214.8 bio yes 1683 5.2%
26IBM November 5,2012
HLRS Testbed Results HPC Simulation (2), storage power only
Runtimemin
PowerW
controller SAMCEE
EnergyKJ
savings
59 231.7 no no 820 reference
61 211.9 no yes 776 5.4%
63 228.0 fuzzy no 862 -5.1%
64 228.3 bio no 877 -6.9%
65 211.6 fuzzy yes 825 -0.6%
66 213.0 bio yes 843 -2.8%
27IBM November 5,2012
HLRS Testbed Results HPC eBusiness GLC (3), storage power only
Runtimemin
PowerW
controller SAMCEE
EnergyKJ
savings
87 228.3 no no 1192 reference
96 209.6 GLC yes 1207 -1.3%
94 210.4 GLC yes 1187 0.5%
The global server controller saves a lot of energy (e.g., 20-30%) by turning off servers.
A strong performance penalty is introduced by GLC when the Virtual Machines are booted
28IBM November 5,2012
Storage Power Reference Run
June012029Control Run
215
220
225
230
235
240
245
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Time (min)
Po
we
r (W
)<P>=231.7 WT= 59 nminE= 820 KJ
Devices not turned off
Power consumed according to load
29IBM November 5,2012
Storage Power with SAMCEE only
June041235SAMCEE only
0
50
100
150
200
250
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
Time (min)
Po
wer
(W)
<P>= 211.9 WT= 61 minE= 776 KJ
Some devices turned off, no performance hit
Some 5.4% in energy savings
30IBM November 5,2012
Storage Power with Controllers
June011350Fuzzy + SAMCEE
180
185
190
195
200
205
210
215
220
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65
time (min)
Pow
er (
W)
<P>=211.6 WT=65 minE=825 KJ
Some devices turned off, some performance hit
No (-0.5%) energy savings
31IBM November 5,2012
HLRS Testbed Results HPC eBusiness GLC (4), storage power and app E
The global server controller saves a lot of energy (e.g., 30-40%) by turning off servers.
A strong performance penalty is introduced by GLC when the it boots new Virtual Machines
Runtime(min)
Power(W)
Controller
SAMCEE
Storage Energy (KJ)
Storage E Saved
App server E (KJ)
87 209.7 no no 1095 ref 1263
87 228.3 no no 1192 ref 1239
96 209.6 GLC yes 1207 -2.8% 678
94 210.4 GLC yes 1187 -1.7% 670
32IBM November 5,2012
HLRS Testbed Migration Results
Used disks before migration (data can be squeezed to 2)
Power before(including 2 spare disks)
Power after (R=2)
Savings(%)
10-12 239 192 24.5
8 231 192 20.3
6 221 192 15.1
4 207 192 7.8
2 192 192 0%
33IBM November 5,2012
ENG Benchmark reference RunWithout SAMCEE RR=2
585590595600605610615620625
Time(min)
Po
wer
(W)
Average power=604.8
Runtime = 109 min
Energy=3966 KJouls
34IBM November 5,2012
ENG Benchmark with SAMCEE
With SAMCEE RR=2
540
550
560
570
580
590
600
Time(min)
Pow
er(W
)
Average power=573.8
Runtime = 105 min
Energy=3609 KJouls (8.7% savings)
35IBM November 5,2012
ENG Testbed Migration Results
Used disks pairs before migration (data can be squeezed to 2)
Power before(including 1 spare disks)
Power after (R=2)
Savings(%)
5-6 583 533 9.4
4 568 533 6.6
3 551 533 3.4
1 or 2 533 533 0%
36IBM November 5,2012
Results Most savings from spinning down unused disks
◦ Most important metric is the capacity usage efficiency For the dynamic scenario
◦ Potential saving of 0-13% depending on data access pattern For the static scenario (data migration)
◦ Most application data after run is static◦ Re-placing the data chunk has potential savings of 0-25%
depending on initial over-provisioning of space Tiers and data consolidation with improvements
◦ Augmented with energy considerations◦ Done at the data center level
Trade off (weighted sum) energy and performance via usage energy efficiency metrics
Interaction with other controllers (GAMES and/or HSM) is important and merits further investigation
Access parallelism (rr_size) is (again) important
37IBM November 5,2012
Publications
"Usage Centric Green Performance Indicators", GreenMetrics 2011 workshop, ACM SIGMETRICS conference, June 7, 2011 San Jose, USA.◦ To be adopted as standard by Green Grid consortium
"ADSC: Application-Driven Storage Control for Energy Efficiency", the 1st International Conference on ICT as Key Technology for the Fight against Global Warming - ICT-GLOW 2011, August 29 - September 2, 2011, Toulouse, France
“Metrics for energy efficient storage”, GAMES whitepaper available at http://www.green-datacenters.eu/ under Public docs, GAMES, StorageMetrics.IBM.WP6.V1.1.pdf, 2011.
"Setting energy-efficiency goals in data centres: the GAMES approach", Proc. E3DC workshop, within E-Energy 2012, Energy Efficient Datacentres, The 1st International Workshop on Energy Efficient Datacentres, May 8-11, Madrid, Spain.
38IBM November 5,2012
Novel energy aware management of storage Major design decisions
◦ Separate control of data placement and disk modes◦ File level data granularity with file splitting◦ Spin down of only unused disks◦ Ranking based data placement: usage centric efficiency metrics
Most savings obtained from spinning down disks◦ Data consolidation (data centers are over-provisioned)◦ Keep spare disks for performance
Tests show 0-13% and 0-25% for dynamic/static No known comparable solutions in the market
today Cross controller interactions need further research
ConclusionsConclusions
39IBM November 5,2012
Backup Slides Backups follow
40IBM November 5,2012
HLRS Experimental System Storage Tiers
Tier1 storage◦ SSD (low capacity, high performance, low energy, very
expensive)◦ 2x Intel SSD, 80GB
Tier2◦ HDD (low capacity, high performance, high energy, high
cost) SAS disks◦ 8x SAS, 147GB, 15k rpm
Tier3◦ low tier disk (high capacity, low performance, low energy,
low cost) SATA disks◦ 4x SATA 1.5TB, 7200rpm
41IBM November 5,2012
HLRS Storage Server Architecture
x86 serverScientific Linux (SL)
4x2GB RAMIntel Xeon DP L5630
NFS
SAS
SASSAS
1Gbit ETH
4x SATA 1.5TB, 7200rpm
8x SAS, 147GB, 15k rpm
2x Intel SSD, 80GBSSD
Energy sensor
SATA
SATA
SATA
SATA
SATA
42IBM November 5,2012
ENG Storage Server Architecture
x86 serverScientific Linux (SL)
16GB RAMIntel Xeon E5620
NFS
SATA
SATA
SAS
SAS
SATA8Gbit FC
RAID of SATA 1TB, 7200rpm
RAID of SAS 450GB, 15k rpm
Energy sensor
1Gbit ETH
Energy sensor
SATA
43IBM November 5,2012
Development Storage Server Architecture (IBM)
x86 serverScientific Linux (SL)
NFS
SATA
SATA
SAS
SAS
SATA
6Gbit SAS
8x SATA 2TB, 7200 rpm
8x SAS 300GB, 15k rpm
Energy sensor
1Gbit ETH
Energy sensor
SATA
6Gbit SAS
Xyratex JBODs
44IBM November 5,2012
Simulation Results
GAMES Palermo 20-1-2010 44
Disk mode
After migration
45IBM November 5,2012
Data Center Storage Efficiency Metrics IBM Haifa is actively involved in the Green Grid DCsE Task
Force, driving the introduction of Data Center Storage Efficiency (DCsE) metrics
The DCsE Task Force has acknowledged the storage metrics developed as part of GAMES and builtupon these metrics
DRAFT
DATA CENTER STORAGE EFFICIENCY – CAPACITY (DCSECAP)
nConsumptioPower StorageCenter Data
in UseCapacity r Center Use DatacapDCsE
DATA CENTER STORAGE EFFICIENCY – WORKLOAD (DCSEIO)
nConsumptio Power StorageCenter Data
Throughput I/OCenter Data ioDCsE
DATA CENTER STORAGE EFFICIENCY – THROUGHPUT (DCSETP)
nConsumptio Power StorageCenter Data
ThroughputTransfer a Center DatData tpDCsE