+ All Categories
Home > Documents > 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This...

1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This...

Date post: 16-Jan-2016
Category:
Upload: robyn-oliver
View: 218 times
Download: 0 times
Share this document with a friend
14
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two- Level Cache Hierarchy This work was supported by the U.S. National Science Foundation and the Semiconductor Research Corporation + Also Affiliated with NSF Center for High-Performance Reconfigurable Computing Ann Gordon-Ross + University of Florida Department of Electrical and Computer Engineering Jeremy Lau* Google Inc. Brad Calder* Microsoft Corporation *This work was done while the author was affiliated with the University of California, San Diego
Transcript
Page 1: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

1 of 20

Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level

Cache Hierarchy

This work was supported by the U.S. National Science Foundation and the Semiconductor Research Corporation

+ Also Affiliated with NSF Center for High-Performance Reconfigurable Computing

Ann Gordon-Ross+

University of FloridaDepartment of Electrical and Computer Engineering

Jeremy Lau*Google Inc.

Brad Calder*Microsoft Corporation

*This work was done while the author was affiliated with the University of California, San Diego

Page 2: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

22 of 14

Cache Power Consumption• Memory access: 50% of embedded processor’s system power

– Caches are power hungry• ARM920T (Segars 01)

• M*CORE (Lee/Moyer/Arends 99)

• Thus, caches are a good candidate for optimizations

• Different applications have vastly different cache requirements– Total size, line size, associativity

4KB 16 byte, 2-way

2KB 32 byte

direct-mapped 8KB 64 byte, 4-way

Page 3: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

3 of 143

Configurable Caches• Even hard processors contain configurable caches

– Specialized software instructions can change cache parameters

– Specialized hardware enables the cache to be configured at startup or in system during runtime

• Motorola M*CORE – Malik ISLPED’00, Albonesi MICRO’00, Zhang ISCA’03

2K

B

2K

B

2K

B

2K

B

8 KB, 4-way base cache

2K

B

2K

B

2K

B

2K

B

8 KB, 2-way

2K

B

2K

B

2K

B

2K

B

8 KB, direct-mapped

Way concatenation

2K

B

2K

B

2K

B

2K

B

4 KB, 2-way2

KB

2K

B

2K

B

2K

B2 KB, direct-

mapped

Way shutdown

Configurable Line size

16 byte physical line size

Tunable cache

Tuning hw

Page 4: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

4 of 144

Cache Tuning• Cache tuning is the process of determining the appropriate cache parameters

for an application

– Requires a tunable cache• Cache parameter values can be varied during runtime

– Requires tuning hardware• Orchestrates cache tuning

Ene

rgy

Executing in base configuration

Tunable cache

Tuning hw

TC Cache TuningTCTCTC

TCTCTCTCTCTC

TC

Download application

Microprocessor

Cache energy savings of 62%

on average!

Page 5: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

5 of 145

Phase-Based Cache Tuning• However, applications show varying operating requirements

throughout execution

• Greater energy savings potential if the cache can be tuned for each one of these phases

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Time varying behavior for IPC, level one data cache hits, branch predictor hits, and power

consumption for gcc (using the integrate input set)

Base cache energy

Application-tuned

TimeEnerg

y C

onsu

mpti

on

Phase-tuned

Change cache

Need a method to detect phase changes during runtime

Page 6: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

66 of 14

Phase Classification• Break application into fixed sized intervals

– Intervals measured in dynamic instructions executed

• Group intervals with similar characteristics as the same phase

• Optimizations applied to one interval of a phase will work equally well with every other interval of the same phase

Page 7: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

77 of 14

Phase Prediction• Predict when a phase transition will occur and

which phase will be entered

• Uses two predictors:– Set of phases leading up to the next phase– Duration of time spent in phases

• Benefit for cache tuning– Can determine best configuration for each phase,

save that configuration, and then change directly to it when the phase is predicted

Page 8: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

88 of 14

Experimental Results• Examined a large selection of SPEC2000

Integer and Floating point benchmarks• Phase classified entire benchmark• Determined best cache configuration for each

phase• Modified SimpleScalar with configurable cache• Executed benchmarks in their entirety with

SimpleScalar to gather cache hit and miss statistics

Page 9: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

99 of 14

Phase-Based Tuning Methodology

App

Execution Profiling

Execution Profile

Phase Classification

Phase starting points

Checkpoint files

Simulation

ACE-AWT (Perl)

Energy Model

Cache configuration for each phase

For each phase:

Fast Functional Simulation

Phase Classification

Page 10: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

1010 of 14

Results - Energy Consumption

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

gzip_graphic

gzip_log

gzip_randomgcc_scilab

gcc_166gcc_exprgcc_200

gcc_integrate

equakeart_110art_470vpr_placevpr_routevortex_onevortex_two

vortex_threebzip_sourcebzip_graphicbzip_program

avg

avg modifiedEnergy consumption normalized to the base cache

configuration

Applicationbased/highlyconfigurablecache

PhaseBased/highlyconfigurablecache

Note: Avg modified averages only the benchmarks were phase-based tuning is favorable

Page 11: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

1111 of 14

Results - Performance

0%

20%

40%

60%

80%

100%

120%

140%

gzip_graphic

gzip_log

gzip_randomgcc_scilab

gcc_166gcc_exprgcc_200

gcc_integrate

equakeart_110art_470vpr_placevpr_routevortex_onevortex_twovortex_threebzip_sourcebzip_graphicbzip_program

avg

avg modified

Execution time normalized to the base cache

configuration

Applicationbased/highlyconfigurablecache

PhaseBased/highlyconfigurablecache

Note: Avg modified averages only the benchmarks were phase-based tuning is favorable

Page 12: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

1212 of 14

Results - Energy Savings

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

gzip_graphic

gzip_log

gzip_randomgcc_scilab

gcc_166gcc_exprgcc_200

gcc_integrate

equakeart_110art_470vpr_placevpr_routevortex_onevortex_twovortex_threebzip_sourcebzip_graphicbzip_program

avg

Energy consumption normalized to the base

cache configuration

Phasebased/best of 2

PhaseBased/best of 5

PhaseBased/highlyconfigurablecache

Energy savings compared to previous phase-based tuning techniques

Page 13: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

1313 of 14

Design Space Exploration SpeedupApplication-based sim

time (days)Phase-based sim time

(days) Speedupgzip_graphic 12.57 1.59 7.92gzip_log 4.79 0.61 7.88gzip_random 9.96 1.25 7.94gcc_scilab 7.52 1.02 7.35gcc_166 5.68 0.73 7.81gcc_expr 1.46 0.20 7.49gcc_200 13.17 1.81 7.29gcc_integrate 1.59 0.21 7.70equake 15.94 2.00 7.97art_110 5.06 0.67 7.54art_470 5.46 0.71 7.69eon_cook 9.77 1.22 7.99eon_kajiya 12.27 1.53 8.00eon_rushmeier 7.01 0.88 7.99vpr_place 0.41 0.05 7.96vpr_route 10.19 1.28 7.96vortex_one 14.42 1.80 7.99vortex_two 16.80 2.10 7.99vortex_three 16.12 2.02 7.99bzip_source 13.19 1.76 7.48bzip_graphic 17.39 2.21 7.86bzip_program 15.14 2.04 7.43avg 9.81 1.26 7.78totals 215.91 27.69

Page 14: 1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

1414 of 14

Conclusions• Phase-based cache tuning for a highly

configurable cache– 1800x greater configurability compared to previous

methods

• Comparable energy savings to application-based tuning– 8% greater savings on average

• 8x speedup in design space exploration time• 17% additional energy savings compared to

previous methods


Recommended