Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | robyn-oliver |
View: | 218 times |
Download: | 0 times |
1 of 20
Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level
Cache Hierarchy
This work was supported by the U.S. National Science Foundation and the Semiconductor Research Corporation
+ Also Affiliated with NSF Center for High-Performance Reconfigurable Computing
Ann Gordon-Ross+
University of FloridaDepartment of Electrical and Computer Engineering
Jeremy Lau*Google Inc.
Brad Calder*Microsoft Corporation
*This work was done while the author was affiliated with the University of California, San Diego
22 of 14
Cache Power Consumption• Memory access: 50% of embedded processor’s system power
– Caches are power hungry• ARM920T (Segars 01)
• M*CORE (Lee/Moyer/Arends 99)
• Thus, caches are a good candidate for optimizations
• Different applications have vastly different cache requirements– Total size, line size, associativity
4KB 16 byte, 2-way
2KB 32 byte
direct-mapped 8KB 64 byte, 4-way
3 of 143
Configurable Caches• Even hard processors contain configurable caches
– Specialized software instructions can change cache parameters
– Specialized hardware enables the cache to be configured at startup or in system during runtime
• Motorola M*CORE – Malik ISLPED’00, Albonesi MICRO’00, Zhang ISCA’03
2K
B
2K
B
2K
B
2K
B
8 KB, 4-way base cache
2K
B
2K
B
2K
B
2K
B
8 KB, 2-way
2K
B
2K
B
2K
B
2K
B
8 KB, direct-mapped
Way concatenation
2K
B
2K
B
2K
B
2K
B
4 KB, 2-way2
KB
2K
B
2K
B
2K
B2 KB, direct-
mapped
Way shutdown
Configurable Line size
16 byte physical line size
Tunable cache
Tuning hw
4 of 144
Cache Tuning• Cache tuning is the process of determining the appropriate cache parameters
for an application
– Requires a tunable cache• Cache parameter values can be varied during runtime
– Requires tuning hardware• Orchestrates cache tuning
Ene
rgy
Executing in base configuration
Tunable cache
Tuning hw
TC Cache TuningTCTCTC
TCTCTCTCTCTC
TC
Download application
Microprocessor
Cache energy savings of 62%
on average!
5 of 145
Phase-Based Cache Tuning• However, applications show varying operating requirements
throughout execution
• Greater energy savings potential if the cache can be tuned for each one of these phases
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Time varying behavior for IPC, level one data cache hits, branch predictor hits, and power
consumption for gcc (using the integrate input set)
Base cache energy
Application-tuned
TimeEnerg
y C
onsu
mpti
on
Phase-tuned
Change cache
Need a method to detect phase changes during runtime
66 of 14
Phase Classification• Break application into fixed sized intervals
– Intervals measured in dynamic instructions executed
• Group intervals with similar characteristics as the same phase
• Optimizations applied to one interval of a phase will work equally well with every other interval of the same phase
77 of 14
Phase Prediction• Predict when a phase transition will occur and
which phase will be entered
• Uses two predictors:– Set of phases leading up to the next phase– Duration of time spent in phases
• Benefit for cache tuning– Can determine best configuration for each phase,
save that configuration, and then change directly to it when the phase is predicted
88 of 14
Experimental Results• Examined a large selection of SPEC2000
Integer and Floating point benchmarks• Phase classified entire benchmark• Determined best cache configuration for each
phase• Modified SimpleScalar with configurable cache• Executed benchmarks in their entirety with
SimpleScalar to gather cache hit and miss statistics
99 of 14
Phase-Based Tuning Methodology
App
Execution Profiling
Execution Profile
Phase Classification
Phase starting points
Checkpoint files
Simulation
ACE-AWT (Perl)
Energy Model
Cache configuration for each phase
For each phase:
Fast Functional Simulation
Phase Classification
1010 of 14
Results - Energy Consumption
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
gzip_graphic
gzip_log
gzip_randomgcc_scilab
gcc_166gcc_exprgcc_200
gcc_integrate
equakeart_110art_470vpr_placevpr_routevortex_onevortex_two
vortex_threebzip_sourcebzip_graphicbzip_program
avg
avg modifiedEnergy consumption normalized to the base cache
configuration
Applicationbased/highlyconfigurablecache
PhaseBased/highlyconfigurablecache
Note: Avg modified averages only the benchmarks were phase-based tuning is favorable
1111 of 14
Results - Performance
0%
20%
40%
60%
80%
100%
120%
140%
gzip_graphic
gzip_log
gzip_randomgcc_scilab
gcc_166gcc_exprgcc_200
gcc_integrate
equakeart_110art_470vpr_placevpr_routevortex_onevortex_twovortex_threebzip_sourcebzip_graphicbzip_program
avg
avg modified
Execution time normalized to the base cache
configuration
Applicationbased/highlyconfigurablecache
PhaseBased/highlyconfigurablecache
Note: Avg modified averages only the benchmarks were phase-based tuning is favorable
1212 of 14
Results - Energy Savings
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
gzip_graphic
gzip_log
gzip_randomgcc_scilab
gcc_166gcc_exprgcc_200
gcc_integrate
equakeart_110art_470vpr_placevpr_routevortex_onevortex_twovortex_threebzip_sourcebzip_graphicbzip_program
avg
Energy consumption normalized to the base
cache configuration
Phasebased/best of 2
PhaseBased/best of 5
PhaseBased/highlyconfigurablecache
Energy savings compared to previous phase-based tuning techniques
1313 of 14
Design Space Exploration SpeedupApplication-based sim
time (days)Phase-based sim time
(days) Speedupgzip_graphic 12.57 1.59 7.92gzip_log 4.79 0.61 7.88gzip_random 9.96 1.25 7.94gcc_scilab 7.52 1.02 7.35gcc_166 5.68 0.73 7.81gcc_expr 1.46 0.20 7.49gcc_200 13.17 1.81 7.29gcc_integrate 1.59 0.21 7.70equake 15.94 2.00 7.97art_110 5.06 0.67 7.54art_470 5.46 0.71 7.69eon_cook 9.77 1.22 7.99eon_kajiya 12.27 1.53 8.00eon_rushmeier 7.01 0.88 7.99vpr_place 0.41 0.05 7.96vpr_route 10.19 1.28 7.96vortex_one 14.42 1.80 7.99vortex_two 16.80 2.10 7.99vortex_three 16.12 2.02 7.99bzip_source 13.19 1.76 7.48bzip_graphic 17.39 2.21 7.86bzip_program 15.14 2.04 7.43avg 9.81 1.26 7.78totals 215.91 27.69
1414 of 14
Conclusions• Phase-based cache tuning for a highly
configurable cache– 1800x greater configurability compared to previous
methods
• Comparable energy savings to application-based tuning– 8% greater savings on average
• 8x speedup in design space exploration time• 17% additional energy savings compared to
previous methods