+ All Categories
Home > Documents > Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors...

Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors...

Date post: 05-Jan-2016
Category:
Upload: susan-ellis
View: 218 times
Download: 3 times
Share this document with a friend
19
Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim 2,3 1 2 3
Transcript
Page 1: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

Optimal Power Allocation for Multiprogrammed Workloads on

Single-chip Heterogeneous Processors

Euijin Kwon1,2 Jae Young Jang2

Jae W. Lee2 Nam Sung Kim2,3

1 2 3

Page 2: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

2

Single-chip heterogeneous processors

• Compared to systems based on discrete components- Lower communication overhead- Lower power consumption- Lower cost (less silicon)- Emerging application friendly (sequential + parallel processing)

Sources: AMD, Intel, and Samsung

AMD’s Llano Intel’s Sandy Bridge Samsung’s Exynos

Page 3: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

3

Challenges

• SCHP’s performance: limited by power budget- Total chip power budget- CPU/GPU power budget

• Multiprogrammed workload- Workload-aware power allocation- Considering characteristics and metrics

How can optimize overall performance within limited power budget?

Page 4: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

4

Outline

• Motivation• Target platform: SCHP + MW• Workload-aware power allocation

- Characteristics of programs- Evaluation Metrics

• Methodology- Power configuration- Benchmark programs

• Evaluation• Algorithm• Conclusion

Page 5: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

5

Target platform: SCHP + MW• 4-core CPU + 16-SM GPU• Multiple V/F domains DVFS• 2 programs running• Hardware resources evenly divided

GPU0

GPU0 V/F domain

Memory Controllers

MCs V/F domain

CPUCore0

CPUCore1

CPUCore2

CPUCore3

CPU V/F domain(per-core)

GPU1

GPU1 V/F domain

Multiprogrammed Workload

Program 1

Program 2

Page 6: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

6

Workload-aware power allocation• Characteristics of programs

- Non-uniform performance sensitivities • Evaluation metrics

- Throughput vs. Energy efficiency

Nor

mal

ized

thro

ughp

ut

Allocating more power to mri-q

28.6 34.2 39.8 48.6 59.0 0.8

1.0

1.2

1.4

1.6

1.8

2.0

compute-bound (mri-q)memory-bound (stream-copy)

Power allocation (using the same HW)

Page 7: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

7

Outline

• Motivation• Target platform: SCHP + MW• Workload-aware power allocation

- Characteristics of programs- Evaluation Metrics

• Methodology- Power configuration- Benchmark programs

• Evaluation• Algorithm• Conclusion

Page 8: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

8

Methodology: shared power budget

• Can change the power budget for 17.4

24.8

34.2

46.4

62.8 11.2

16.8

22.4

31.2

41.6 11.2

16.8

22.4

31.2

41.6

CPU 2 GPU 1 GPU 2

Power Configuration

Output

17.4

24.8

34.2

46.4

62.8

CPU 1

• Total chip power budget = 100 W• CPU power budget = 80 W• GPU power budget = 64 W• Baseline configuration

- Evenly divided (25 W for each CPU/GPU group)

Throughput EnergyEfficiency

Page 9: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

9

Methodology: benchmark programs

• Used 6 benchmark programs.• Divided into 3 groups depending on characteristics

Benchmark Acronym Source Characteristics

Magnetic Resonance Imaging Q MRQ Parboil Compute-bound

Stream Cluster SCL Rodinia Compute-bound

Hotspot HOT Rodinia Neutral

Sum of Absolute Difference SAD Parboil Neutral

Stencil STN Parboil Memory-bound

Stream Copy SCP CS Virginia Memory-bound

Page 10: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

10

Outline

• Motivation• Target platform: SCHP + MW• Workload-aware power allocation

- Characteristics of programs- Evaluation Metrics

• Methodology- Power configuration- Benchmark programs

• Evaluation• Algorithm• Conclusion

Page 11: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

11

Evaluation: case study 1 (compute- vs. memory-bound)

19% throughput improvement 32% energy efficiency improvement

• Allocating more power to compute-bound• Optimal points vary depending on metrics.

Page 12: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

12

Evaluation: case study 2 (memory- vs. memory-bound)

10% throughput improvement 32% energy efficiency improvement

• Equally allocated power• Again, optimal point depends on

- Evaluation metric- Workload characteristics (compute- or memory-bound)

Page 13: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

13

Evaluation: variation of optimal configuration

• Depending on programs’ characteristics and evaluation metrics

P1 P2Metric 1: throughput Metric 2: energy efficiency

P1 (Watt) P2 (Watt) P1 (Watt) P2 (Watt)CPU GPU CPU GPU CPU GPU CPU GPU

MRQ (C) SCL(C) 17.4 31.2 17.4 31.2 17.4 16.8 17.4 16.8SCP (M) STN (M) 17.4 31.2 17.4 31.2 17.4 11.2 17.4 11.2SAD (N) HOT (N) 17.4 31.2 17.4 31.2 17.4 11.2 17.4 16.8MRQ (C) SCP (M) 17.4 41.6 17.4 22.4 17.4 22.4 17.4 16.8SCL (C) SCP (M) 17.4 41.6 17.4 22.4 17.4 11.2 17.4 11.2

HOT (N) MRQ(N) 17.4 31.2 17.4 31.2 17.4 11.2 17.4 22.4MRQ (C) SAD (N) 17.4 31.2 17.4 31.2 17.4 16.8 17.4 22.4SCL (C) SAD (N) 17.4 31.2 17.4 31.2 17.4 16.8 17.4 11.2

HOT (N) STN (M) 17.4 41.6 17.4 22.4 17.4 11.2 17.4 11.2HOT (N) SCP (M) 17.4 41.6 17.4 22.4 17.4 11.2 17.4 11.2SAD (N) SCP (M) 17.4 41.6 17.4 22.4 17.4 11.2 17.4 22.4

Page 14: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

14

Evaluation: performance improvement from optimal power allocation

• Achieved significant improvement- 12% for throughput- 18% for energy efficiency

MRQ

vs.

SCL

(CC)

SCP

vs. S

TN (M

M)

SAD

vs.

HO

T (N

N)

MRQ

vs.

SCP

(CM

)

SCL

vs. S

CP (C

M)

HO

T vs

. MRQ

(NC)

MRQ

vs.

SAD

(CN

)

SCL

vs. S

AD (C

N)

HO

T vs

. STN

(NM

)

HO

T vs

. SCP

(NM

)

SAD

vs.

SCP

(NM

)

GEO

MEA

N

0.9

1.1

1.3

Normalized IPS Normalized IPS/W

Page 15: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

15

Algorithm for throughput maximization

calculate (slope)

abs(sp1-sp2) < threshold

sp1 > sp2

alloc(p2_more)

alloc(p1_more)

alloc(equally)

wait(regular_time)

YES

YES

NO

NO

Nor

mal

ized

thro

ughp

ut

28.6 34.2 39.8 48.6 59.0 0.8

1.0

1.2

1.4

1.6

1.8

2.0

compute-bound (mri-q)memory-bound (stream-copy)

Power allocation

Page 16: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

16

Algorithm for energy efficiency maximization

final = min_power

EE(final) == MAX

EE(final, p1++) > EE(final, p2++)

final = (final, p1++)

final = (final, p2++)

exit

MAX = max( EE(final), EE(final, p1++), EE(final, p2++) )

• Gradient search from the minimum power allocation

Page 17: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

17

Conclusion

• We propose a solution for optimal power allocation - Workload-aware power allocation- By using program characteristics and evaluation metrics

• Significant performance improvement achieved- 12% for throughput- 18% for energy efficiency

• Run-time algorithms effectively find (near-)optimal power allocation

Page 18: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

18

Backup slides

Page 19: Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

19

Simulator• Integrated CPU + GPU simulator

- H. Wang, V. Sathish, R. Singh, M. Schulte and N. Kim, "Workload and Power Budget Partitioning for Single-Chip Heterogeneous Processors," in PACT, 2012.

- http://cpu-gpu-sim.ece.wisc.edu/- gem5 + GPGPU-Sim

• Adaptive power allocation for multiprogrammed workload- Per-core V/F domains for CPU- 2 V/F domains for GPU


Recommended