Etienne Le Sueur and Gernot Heiser
Dynamic Voltage and Frequency ScalingThe Laws of Diminishing Returns
HotPower’10, Vancouver, Canada, [email protected]
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What is DVFS?
Dynamic Voltage and Frequency Scaling
2
P = CfV 2
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What is DVFS?
3
P = CfV 2
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What is DVFS?
dynamic power consumption
3
P = CfV 2
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What is DVFS?
dynamic power consumption
constant
3
P = CfV 2
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What is DVFS?
dynamic power consumption
frequencyconstant
3
P = CfV 2
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What is DVFS?
dynamic power consumption voltage squared
frequencyconstant
3
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
In the past...
“Under some conditions, we observe energy savings of 30% for a 4% performance loss.”
Snowdon et al. [2009]
4
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
In the past...
“... in most of the traces the potential for energy savings is good. The savings range from about 5% to about 75%, with most data points falling between 25% to 65% savings.”
Weiser et al. [1994]
5
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
In the past...
“Energy savings of 22% ... to complete the same task are possible without a substantial reduction in application performance...”
Weissel et al. [2002]
6
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
In the past...
“... Hence, on this system, by lowering the frequency to a point where the workloads can be adequately served without sacrificing latency, energy is saved.”
Miyoshi et al. [2002]
7
P = CfV 2 + Pstatic
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The whole story
8
P = CfV 2 + Pstatic
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The whole story
total power consumption
8
P = CfV 2 + Pstatic
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The whole story
total power consumption
dynamic power consumption
8
P = CfV 2 + Pstatic
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The whole story
total power consumption
dynamic power consumption
static power consumption
8
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Static power consumption
9
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Static power consumption
9
CPU leakage
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Static power consumption
9
Hard drives
CPU leakage
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Static power consumption
9
Hard drives
CPU leakage
Memory
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Static power consumption
9
Hard drives
CPU leakage
Memory
Power supply losses
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Static power consumption
9
Hard drives
CPU leakage
Memory
etc.Power supply losses
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How can DVFS save energy?
400c30: 48 8b 11 mov (%rcx),%rdx 400c33: 48 8b 42 48 mov 0x48(%rdx),%rax 400c37: 48 89 41 10 mov %rax,0x10(%rcx) 400c3b: 48 89 4a 48 mov %rcx,0x48(%rdx) 400c3f: 48 8b 51 08 mov 0x8(%rcx),%rdx 400c43: 48 8b 42 50 mov 0x50(%rdx),%rax 400c47: 48 89 41 18 mov %rax,0x18(%rcx) 400c4b: 48 89 4a 50 mov %rcx,0x50(%rdx) 400c4f: 48 83 c1 40 add $0x40,%rcx
10
[SPEC CPU2000, 181.mcf, gcc 4.2, high optimisation]
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How can DVFS save energy?
400c30: 48 8b 11 mov (%rcx),%rdx 400c33: 48 8b 42 48 mov 0x48(%rdx),%rax 400c37: 48 89 41 10 mov %rax,0x10(%rcx) 400c3b: 48 89 4a 48 mov %rcx,0x48(%rdx) 400c3f: 48 8b 51 08 mov 0x8(%rcx),%rdx 400c43: 48 8b 42 50 mov 0x50(%rdx),%rax 400c47: 48 89 41 18 mov %rax,0x18(%rcx) 400c4b: 48 89 4a 50 mov %rcx,0x50(%rdx) 400c4f: 48 83 c1 40 add $0x40,%rcx
10
[SPEC CPU2000, 181.mcf, gcc 4.2, high optimisation]
... such workloads can be memory-bound
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How can DVFS save energy?
11
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How can DVFS save energy?
11
0.8 1.5 2 2.70.6
0.7
0.8
0.9
1.0
1.1
Normalised CPU cycles for 164.gzip and 181.mcf
Nor
mal
ised
TS
C
Frequency (GHz)
164.gzip (cpu-bound) 181.mcf (memory-bound)
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How can DVFS save energy?
11
0.8 1.5 2 2.70.6
0.7
0.8
0.9
1.0
1.1
Normalised CPU cycles for 164.gzip and 181.mcf
Nor
mal
ised
TS
C
Frequency (GHz)
164.gzip (cpu-bound) 181.mcf (memory-bound)
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How can DVFS save energy?
11
0.8 1.5 2 2.70.6
0.7
0.8
0.9
1.0
1.1
Normalised CPU cycles for 164.gzip and 181.mcf
Nor
mal
ised
TS
C
Frequency (GHz)
164.gzip (cpu-bound) 181.mcf (memory-bound)
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How can DVFS save energy?
12
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How can DVFS save energy?
12
0.8 1.5 2.0 2.71.0
1.5
2.0
2.5
3.0
3.5
Runtime of 164.gzip (cpu-bound) vs. 181.mcf (memory-bound)
Nor
mal
ised
run
time
Frequency(GHz)
Runtime 181.mcf (memory-bound) Runtime 164.gzip (cpu-bound)
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How can DVFS save energy?
12
0.8 1.5 2.0 2.71.0
1.5
2.0
2.5
3.0
3.5
Runtime of 164.gzip (cpu-bound) vs. 181.mcf (memory-bound)
Nor
mal
ised
run
time
Frequency(GHz)
Runtime 181.mcf (memory-bound) Runtime 164.gzip (cpu-bound)
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How can DVFS save energy?
12
0.8 1.5 2.0 2.71.0
1.5
2.0
2.5
3.0
3.5
Runtime of 164.gzip (cpu-bound) vs. 181.mcf (memory-bound)
Nor
mal
ised
run
time
Frequency(GHz)
Runtime 181.mcf (memory-bound) Runtime 164.gzip (cpu-bound)
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Our analysis
3 generations of AMD Opteron CPUsin server-class systems
13
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Our analysis
14
Die codename Sledgehammer
Year 2003
Core count 1
Frequency range 0.8 - 2.0GHz
Voltage range 0.9 - 1.5
Process 130nm
TDP 89W
Die area 193mm2
Transistor count 106M
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Sledgehammer
15
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Sledgehammer
0.75
1.00
1.25
1.50
0.8 1 1.6 1.8 2
Energy and runtime of 181.mcf on Sledgehammer
Nor
mal
ised
ene
rgy/
runt
ime
Frequency (GHz)
Energy Runtime
15
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Our analysis
16
Die codename Sledgehammer Santa Rosa
Year 2003 2006
Core count 1 2
Frequency range 0.8 - 2.0GHz 1.0 - 2.4GHz
Voltage range 0.9 - 1.5 0.9 - 1.35V
Process 130nm 90nm
TDP 89W 95W
Die area 193mm2 230mm2
Transistor count 106M 243M
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Santa Rosa
17
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Santa Rosa
1 1.2 1.4 1.6 1.8 2 2.2 2.40.50
0.75
1.00
1.25
1.50
1.75
Energy and runtime of 181.mcf on Santa Rosa
Nor
mal
ised
ene
rgy/
runt
ime
Frequency (GHz)
Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instances
17
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Santa Rosa
1 1.2 1.4 1.6 1.8 2 2.2 2.40.50
0.75
1.00
1.25
1.50
1.75
Energy and runtime of 181.mcf on Santa Rosa
Nor
mal
ised
ene
rgy/
runt
ime
Frequency (GHz)
Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instances
17
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Santa Rosa
1 1.2 1.4 1.6 1.8 2 2.2 2.40.50
0.75
1.00
1.25
1.50
1.75
Energy and runtime of 181.mcf on Santa Rosa
Nor
mal
ised
ene
rgy/
runt
ime
Frequency (GHz)
Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instances
17
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Our analysis
18
Die codename Sledgehammer Santa Rosa Shanghai
Year 2003 2006 2009
Core count 1 2 4
Frequency range 0.8 - 2.0GHz 1.0 - 2.4GHz 0.8 - 2.7GHz
Voltage range 0.9 - 1.5 0.9 - 1.35V 1.0 - 1.35V
Process 130nm 90nm 45nm
TDP 89W 95W 75W
Die area 193mm2 230mm2 285mm2
Transistor count 106M 243M 463M
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Shanghai
19
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Shanghai
0.8 1.5 2 2.71.00
1.44
1.88
2.31
2.75
Energy and runtime of 181.mcf on Shanghai
Nor
mal
ised
ene
rgy/
runt
ime
Frequency (GHz)
Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instancesEnergy 4 instances Runtime 4 instances
19
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Shanghai
0.8 1.5 2 2.71.00
1.44
1.88
2.31
2.75
Energy and runtime of 181.mcf on Shanghai
Nor
mal
ised
ene
rgy/
runt
ime
Frequency (GHz)
Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instancesEnergy 4 instances Runtime 4 instances
19
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Shanghai
0.8 1.5 2 2.71.00
1.44
1.88
2.31
2.75
Energy and runtime of 181.mcf on Shanghai
Nor
mal
ised
ene
rgy/
runt
ime
Frequency (GHz)
Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instancesEnergy 4 instances Runtime 4 instances
19
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Shanghai
0.8 1.5 2 2.71.00
1.44
1.88
2.31
2.75
Energy and runtime of 181.mcf on Shanghai
Nor
mal
ised
ene
rgy/
runt
ime
Frequency (GHz)
Energy 1 instance Runtime 1 instanceEnergy 2 instances Runtime 2 instancesEnergy 4 instances Runtime 4 instances
19
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Results
DVFS on Shanghai is ineffective for saving energy!
20
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Results
DVFS on Shanghai is ineffective for saving energy!
... but why?
20
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Static power
DVFS can only change dynamic power consumptionand static power is increasing!
21
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What are the trends?
22
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What are the trends?
• scaling of transistor technology;
22
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What are the trends?
• scaling of transistor technology;• increasing memory performance;
22
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What are the trends?
• scaling of transistor technology;• increasing memory performance;• improved idle/sleep modes; and
22
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
What are the trends?
• scaling of transistor technology;• increasing memory performance;• improved idle/sleep modes; and• multi-core processors.
22
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Scaling of transistor technology
In ~7 years:
23
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Scaling of transistor technology
• 130 nm to 45 nm
In ~7 years:
23
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Scaling of transistor technology
• 130 nm to 45 nm • 106 M to 463 M transistors
In ~7 years:
23
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Scaling of transistor technology
• 130 nm to 45 nm • 106 M to 463 M transistors• 193 mm2 to 285 mm2
In ~7 years:
23
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Scaling of transistor technology
• 130 nm to 45 nm • 106 M to 463 M transistors• 193 mm2 to 285 mm2
• 1 MiB to 8 MiB total SRAM cache
In ~7 years:
23
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Scaling of transistor technology
• 130 nm to 45 nm • 106 M to 463 M transistors• 193 mm2 to 285 mm2
• 1 MiB to 8 MiB total SRAM cache• Single to quad-core (more later...)
In ~7 years:
23
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Scaling of transistor technology
• 130 nm to 45 nm • 106 M to 463 M transistors• 193 mm2 to 285 mm2
• 1 MiB to 8 MiB total SRAM cache• Single to quad-core (more later...)
In ~7 years:
Static power is increasing significantly, dynamic power is decreasing
23
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Increasing memory performance
In ~7 years:
24
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Increasing memory performance
• Much larger CPU SRAM caches (fewer cache-misses)
In ~7 years:
24
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Increasing memory performance
• Much larger CPU SRAM caches (fewer cache-misses)• DRAM throughput: 3.2GB/s to 5.33GB/s
In ~7 years:
24
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Increasing memory performance
• Much larger CPU SRAM caches (fewer cache-misses)• DRAM throughput: 3.2GB/s to 5.33GB/s • Larger DRAM prefetch distance: 2 to 3 cache-lines
In ~7 years:
24
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Increasing memory performance
• Much larger CPU SRAM caches (fewer cache-misses)• DRAM throughput: 3.2GB/s to 5.33GB/s • Larger DRAM prefetch distance: 2 to 3 cache-lines• Dual to triple channel DDR memory-controllers
In ~7 years:
24
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Increasing memory performance
• Much larger CPU SRAM caches (fewer cache-misses)• DRAM throughput: 3.2GB/s to 5.33GB/s • Larger DRAM prefetch distance: 2 to 3 cache-lines• Dual to triple channel DDR memory-controllers
In ~7 years:
Fewer pipeline stalls for memory references means code is less memory-bound
24
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Improved idle/sleep modes
In ~7 years:
25
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Improved idle/sleep modes
In ~7 years:
• From a single ʻhaltʼ mode (ACPI C1)
25
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Improved idle/sleep modes
In ~7 years:
• From a single ʻhaltʼ mode (ACPI C1)• To multiple stepped low-power sleep modes (C1 - 4)
25
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Improved idle/sleep modes
In ~7 years:
• From a single ʻhaltʼ mode (ACPI C1)• To multiple stepped low-power sleep modes (C1 - 4)
Push towards race-and-halt
25
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Multi-core processors
26
Today:
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Multi-core processors
• Single off-chip voltage regulator module (VRM)
26
Today:
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Multi-core processors
• Single off-chip voltage regulator module (VRM)• One or more on-chip PLL clock generator modules
26
Today:
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Multi-core processors
• Single off-chip voltage regulator module (VRM)• One or more on-chip PLL clock generator modules
Per-core DVFS adds complexity to hardware and DVFS algorithms
26
Today:
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How much energy can we really save?
27
Up to now, weʼve ignored the performance loss...
We can:
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How much energy can we really save?
27
Up to now, weʼve ignored the performance loss...
• use a different benchmarking methodology;
We can:
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How much energy can we really save?
27
Up to now, weʼve ignored the performance loss...
• use a different benchmarking methodology;• use a different metric;
We can:
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
How much energy can we really save?
27
Up to now, weʼve ignored the performance loss...
• use a different benchmarking methodology;• use a different metric;• or both.
We can:
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
28
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
• Extend shorter executions;
28
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
• Extend shorter executions;• Add idle energy
28
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
• Extend shorter executions;• Add idle energy
28
0
0.25
0.50
0.75
1.00
Padded vs. non-padded benchmarks
Pow
er c
onsu
mp
tion
Time
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
• Extend shorter executions;• Add idle energy
28
0
0.25
0.50
0.75
1.00
Padded vs. non-padded benchmarks
Pow
er c
onsu
mp
tion
Static power
Time
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
• Extend shorter executions;• Add idle energy
28
0
0.25
0.50
0.75
1.00
Padded vs. non-padded benchmarks
Pow
er c
onsu
mp
tion
Low frequency
Static power
Time
tlow
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
• Extend shorter executions;• Add idle energy
28
0
0.25
0.50
0.75
1.00
Padded vs. non-padded benchmarks
Pow
er c
onsu
mp
tion High frequency
Low frequency
Static power
Time
thigh tlow
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
• Extend shorter executions;• Add idle energy
28
0
0.25
0.50
0.75
1.00
Padded vs. non-padded benchmarks
Pow
er c
onsu
mp
tion High frequency
Low frequency
Static power
Time
thigh tlow
Idle energy
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
29
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
29
0.8 1.5 2 2.70.4
0.6
0.8
1.0
Padded energy for 181.mcf on Shanghai
Nor
mal
ised
ene
rgy
Frequency (GHz)
Padded energy 1 Instance Padded energy 2 instances Padded energy 4 instances
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
29
0.8 1.5 2 2.70.4
0.6
0.8
1.0
Padded energy for 181.mcf on Shanghai
Nor
mal
ised
ene
rgy
Frequency (GHz)
Padded energy 1 Instance Padded energy 2 instances Padded energy 4 instances
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
29
0.8 1.5 2 2.70.4
0.6
0.8
1.0
Padded energy for 181.mcf on Shanghai
Nor
mal
ised
ene
rgy
Frequency (GHz)
Padded energy 1 Instance Padded energy 2 instances Padded energy 4 instances
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
29
0.8 1.5 2 2.70.4
0.6
0.8
1.0
Padded energy for 181.mcf on Shanghai
Nor
mal
ised
ene
rgy
Frequency (GHz)
Padded energy 1 Instance Padded energy 2 instances Padded energy 4 instances
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Padding methodology
30
What about performance?
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Energy-delay product
31
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Energy-delay product
31
0.8 1.5 2 2.70.5
1.0
1.5
2.0
2.5
Padded energy-delay product for 181.mcf on Shanghai
Nor
mal
ised
pad
ded
ED
P
Frequency (GHz)
EDP 1 instance EDP 2 instances EDP 4 instances
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Energy-delay product
31
0.8 1.5 2 2.70.5
1.0
1.5
2.0
2.5
Padded energy-delay product for 181.mcf on Shanghai
Nor
mal
ised
pad
ded
ED
P
Frequency (GHz)
EDP 1 instance EDP 2 instances EDP 4 instances
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Energy-delay product
31
0.8 1.5 2 2.70.5
1.0
1.5
2.0
2.5
Padded energy-delay product for 181.mcf on Shanghai
Nor
mal
ised
pad
ded
ED
P
Frequency (GHz)
EDP 1 instance EDP 2 instances EDP 4 instances
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Energy-delay product
31
0.8 1.5 2 2.70.5
1.0
1.5
2.0
2.5
Padded energy-delay product for 181.mcf on Shanghai
Nor
mal
ised
pad
ded
ED
P
Frequency (GHz)
EDP 1 instance EDP 2 instances EDP 4 instances
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The future...
32
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The future...
32
• 32 nm process already in production;
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The future...
32
• 32 nm process already in production;• 22 nm and smaller are on the horizon;
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The future...
32
• 32 nm process already in production;• 22 nm and smaller are on the horizon;• caches will get bigger;
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The future...
32
• 32 nm process already in production;• 22 nm and smaller are on the horizon;• caches will get bigger;• leakage power will rise;
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The future...
32
• 32 nm process already in production;• 22 nm and smaller are on the horizon;• caches will get bigger;• leakage power will rise;• memory performance will continue to improve; and
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The future...
32
• 32 nm process already in production;• 22 nm and smaller are on the horizon;• caches will get bigger;• leakage power will rise;• memory performance will continue to improve; and• entry/exit costs to/from sleep modes will improve.
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
The future...
32
• 32 nm process already in production;• 22 nm and smaller are on the horizon;• caches will get bigger;• leakage power will rise;• memory performance will continue to improve; and• entry/exit costs to/from sleep modes will improve.
What about DVFS?
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
A glimpse
33
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
A glimpse
33
1.199 1.333 1.466 1.599 1.733 1.866 1.999 2.133 2.266 2.399 2.533 3.0660.75
1.00
1.25
1.50
1.75
2.00
2.25
Energy, runtime and EDP for Westmere (Core i5-540M)
Nor
mal
ised
ene
rgy,
run
time
and
ED
P
Frequency (GHz)
Runtime Energy Energy-delay product
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
A glimpse
33
1.199 1.333 1.466 1.599 1.733 1.866 1.999 2.133 2.266 2.399 2.533 3.0660.75
1.00
1.25
1.50
1.75
2.00
2.25
Energy, runtime and EDP for Westmere (Core i5-540M)
Nor
mal
ised
ene
rgy,
run
time
and
ED
P
Frequency (GHz)
Runtime Energy Energy-delay product
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
A glimpse
33
1.199 1.333 1.466 1.599 1.733 1.866 1.999 2.133 2.266 2.399 2.533 3.0660.75
1.00
1.25
1.50
1.75
2.00
2.25
Energy, runtime and EDP for Westmere (Core i5-540M)
Nor
mal
ised
ene
rgy,
run
time
and
ED
P
Frequency (GHz)
Runtime Energy Energy-delay product
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
A glimpse
33
1.199 1.333 1.466 1.599 1.733 1.866 1.999 2.133 2.266 2.399 2.533 3.0660.75
1.00
1.25
1.50
1.75
2.00
2.25
Energy, runtime and EDP for Westmere (Core i5-540M)
Nor
mal
ised
ene
rgy,
run
time
and
ED
P
Frequency (GHz)
Runtime Energy Energy-delay product
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
A glimpse
33
1.199 1.333 1.466 1.599 1.733 1.866 1.999 2.133 2.266 2.399 2.533 3.0660.75
1.00
1.25
1.50
1.75
2.00
2.25
Energy, runtime and EDP for Westmere (Core i5-540M)
Nor
mal
ised
ene
rgy,
run
time
and
ED
P
Frequency (GHz)
Runtime Energy Energy-delay product
Turbo Boost
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Concluding remarks
34
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Concluding remarks
34
• Transistor scaling is causing higher proportions of static leakage power;
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Concluding remarks
34
• Transistor scaling is causing higher proportions of static leakage power;
• smaller core voltages mean DVFS has less range to play with;
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Concluding remarks
34
• Transistor scaling is causing higher proportions of static leakage power;
• smaller core voltages mean DVFS has less range to play with;
• improving memory performance means fewer opportunities to reduce CPU frequency without significant loss of performance;
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Concluding remarks
34
• Transistor scaling is causing higher proportions of static leakage power;
• smaller core voltages mean DVFS has less range to play with;
• improving memory performance means fewer opportunities to reduce CPU frequency without significant loss of performance;
• sleep/idle modes are becoming much more efficient;
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Concluding remarks
34
• Transistor scaling is causing higher proportions of static leakage power;
• smaller core voltages mean DVFS has less range to play with;
• improving memory performance means fewer opportunities to reduce CPU frequency without significant loss of performance;
• sleep/idle modes are becoming much more efficient;• DVFS implementations on multi-core processors are
more complex and the cost-benefit is small;
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Concluding remarks
34
• Transistor scaling is causing higher proportions of static leakage power;
• smaller core voltages mean DVFS has less range to play with;
• improving memory performance means fewer opportunities to reduce CPU frequency without significant loss of performance;
• sleep/idle modes are becoming much more efficient;• DVFS implementations on multi-core processors are
more complex and the cost-benefit is small;• Optimal energy-efficiency is achieved by running fast.
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Future direction
35
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Future direction
35
• SIMD workloads
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Future direction
35
• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which
could increase memory-boundedness
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Future direction
35
• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which
could increase memory-boundedness
• Periodic workloads
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Future direction
35
• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which
could increase memory-boundedness
• Periodic workloads– MPEG video/audio playback, analyse race-to-halt
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Future direction
35
• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which
could increase memory-boundedness
• Periodic workloads– MPEG video/audio playback, analyse race-to-halt– usefulness of sleep modes
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Future direction
35
• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which
could increase memory-boundedness
• Periodic workloads– MPEG video/audio playback, analyse race-to-halt– usefulness of sleep modes
• Analyse and compare the trends for embedded devices
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Future direction
35
• SIMD workloads– SSE/3Dnow! workloads require many 64bit operands, which
could increase memory-boundedness
• Periodic workloads– MPEG video/audio playback, analyse race-to-halt– usefulness of sleep modes
• Analyse and compare the trends for embedded devices– Mobile phones and netbooks
© NICTA 2010
Questions?
© NICTA 2010
Questions?
???
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Intel Atom N270
37
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Intel Atom N270
37
0.8 1.067 1.333 1.60.8
1.0
1.2
1.4
1.6
1.8
Energy, runtime and EDP for Atom N270
Nor
mal
ised
ene
rgy,
run
timea
nd E
DP
Frequency (GHz)
Runtime Energy EDP
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Intel Atom N270
37
0.8 1.067 1.333 1.60.8
1.0
1.2
1.4
1.6
1.8
Energy, runtime and EDP for Atom N270
Nor
mal
ised
ene
rgy,
run
timea
nd E
DP
Frequency (GHz)
Runtime Energy EDP
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Intel Atom N270
37
0.8 1.067 1.333 1.60.8
1.0
1.2
1.4
1.6
1.8
Energy, runtime and EDP for Atom N270
Nor
mal
ised
ene
rgy,
run
timea
nd E
DP
Frequency (GHz)
Runtime Energy EDP
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Intel Atom N270
37
0.8 1.067 1.333 1.60.8
1.0
1.2
1.4
1.6
1.8
Energy, runtime and EDP for Atom N270
Nor
mal
ised
ene
rgy,
run
timea
nd E
DP
Frequency (GHz)
Runtime Energy EDP
© NICTA 2010 www.ertos.nicta.com.au/research/power/ /35
Intel Core 2 Duo (Ultra-low Voltage)
38
0.8
1.0
1.2
1.4
1.6
1.8
2.0
0.8 1.2 1.4 1.6
Energy, runtime and EDP for MacBook Air (Core2 Duo, ULV)
Nor
mal
ised
ene
rgy,
run
timea
nd E
DP
Frequency (GHz)
Runtime Energy EDP
From imagination to impact
From imagination to impact
From imagination to impact
From imagination to impact