Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | joel-pitts |
View: | 21 times |
Download: | 0 times |
1/23
Characterization and Dynamic Mitigation of Intra-Application Cache Interference
Carole-Jean Wu and Margaret Martonosi Princeton University
4/11/2011
2011 International Symposium on Performance Analysis on Systems and Software (ISPASS)
Today’s CMP systems
Memory Controller
Com
munication B
ridge
Shared 8MB L3 Cache
IO &
QP
I
SMT CPUCore 0
SMT CPUCore 1
SMT CPUCore 2
SMT CPUCore 3
L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$
L2$ L2$ L2$ L2$
Operating System
IO &
QP
I
App. 1 App. 4App. 2App. 2 App. 3
1/23
Within a single application, cache interference can stem from…
Memory Controller
Com
munication B
ridge
Shared 8MB L3 Cache
IO &
QP
I
IO &
QP
I
SMT CPUCore 0
SMT CPUCore 1
SMT CPUCore 2
SMT CPUCore 3
L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$
L2$ L2$ L2$ L2$
Operating System
TLB Miss HandlingHW Prefetch Req.
Other OS Req.
App. 1
App. Data ld/st
2/23
Real-System LLC Miss Characterization
mcf
libquantum lbm
omnetppasta
rnamd
povray
soplex
sphinx3
sjeng
bzip2 gcc
gobmk
h264ref0%
20%
40%
60%
80%
100%
Application LLC MissesOthers [prefetching, page table walks, and etc]
Perc
enta
ge (%
) >50% of LLC misses are due to prefetching, TLB miss handling, other OS refs, etc.
3/23
1/23
Prior Work for Intra-Application Cache Interference
• System-induced Cache Interference– Characterization indicates significant OS/user cache interference
[Agarwal et al. TOC ’88][Torrellas et al. ASPLOS ’92]
– Reduce TLB miss handling effects [Jacob, Mudge ASPLOS ’98][Bhargava et al. ASPLOS ’08] [Barr, Cox, and Rixner ISCA ’10]
• Prefetch-induced Cache Interference– Prefetch buffer/filter
[Peir et al. ICS ’02] [Hur and Lin MICRO ’06]
– Replacement policies (Prefetch bit per cache line)[Alameldeen and Wood ISCA ’07] [Lin et al. HPCA ’01]
– Prefetching algorithms [Ebrahimi et al. MICRO ’09] [Nesbit et al. ISCA ’07] [Iacobovici et al. ICS ’04]
But all require hardware modification
4/23
1/23
Contributions of This Paper
1. Cache interference within an application is a problem Real-system characterization Detailed full-system simulation
2. Dynamic management mechanisms System-aware cache management Real-system, real-time prefetch manager
5/23
1/23
Talk Outline
Motivation and Prior Work Measurement Methodology Intra-Application Interference Characterization Dynamic Mitigation of LLC Interference
System-Aware Cache Management Real-System Dynamic Prefetch Manager
Conclusion
6/23
1/23
Measurement Methodology
• Real-system infrastructure– Intel Nehalem-based Core i7 (Bloomfield)– perfmon2 to access hardware PMCs
• Full-system simulation: Simics/GEMS– Simics/GEMS full system simulation
• Benchmarks– SPEC CPU2006 benchmark suite
7/23
1/23
System-Mode Reference Breakdown
mcf lbm
omnetppasta
r
soplex
sphinx3
sjeng
bzip2 gcc
gobmk0%
20%40%60%80%
100%
page table walk references other system-mode references
Syst
em-M
ode
Refe
renc
e Br
eakd
own
80% of system references are due to TLB miss handling (details in the paper).
8/23
1/23
Memory Reuse Characteristics Analysis for User References
mcf sphinx3 sjeng bzip2 Avg.0%
20%
40%
60%
80%
100%
User-Mode References
zero-reused cache lines [baseline]zero-reused cache lines [user only]
Zero
-reu
sed
cach
elin
esUser System
System cache lines destroy good data locality of user lines when sharing the cache!
9/23
1/23
Memory Reuse Characteristics Analysis for System References
mcf sphinx3 sjeng bzip2 Avg.0%
20%
40%
60%
80%
100%
System-Mode References
zero reused cachelines [baseline]zero reused cache lines [system only]
Zero
-reu
sed
cach
elin
esUser System
Majority of system cache lines are not reused.Bypassing system cache lines?
10/23
1/23
System-Aware Cache Management
. . . . . . . .
Refs0xEEEAMRU LRU
11/23
1/23
System-Aware Cache Management
MRU LRU. . . . . . . . MID
Refs
0xEEEA0XDFAE0X12340XDADA 0XEEAF
0X001A
MRU LRU
12/23
1/23
System-Aware Cache Management
MRU LRU. . . . . . . . MID 0xEEEA0XDFAE0X12340XDADA 0XEEAF….
system
Refsuser
SYS-LRUinsert
MRU LRU
13/23
1/23
System-Aware Cache Management
MRU LRU. . . . . . . . MID 0xEEEA0X12340XDADA 0XEEAF….
system
Refsuser
SYS-MIDinsert
MRU LRU
14/23
1/23
System-Aware Cache Management
MRU LRU. . . . . . . . MID 0xBEEF0X12340XDADA 0XEEAF….
system
Refsuser
SYS-DYNAMIC
MRU LRU
*Set sampling: DIP [Qureshi et al. ISCA ‘07]
15/23
1/23
IPC Performance Improvement
mcf lbm
omnetppasta
r
soplex
sphinx3
sjeng
bzip2 gcc
gobmkAvg
.0.80.9
11.11.21.3
SYS-LRUinsert SYS-MIDinsert SYS-DYNAMIC
Aggr
. IPC
Nor
mal
ized
to B
asel
ine
(Hig
her i
s Be
tter
)
SYS-DYNAMIC improves performance for ALL applications by as much as 10% (avg. of 3%).
16/23
1/23
Talk Outline
Motivation and Prior Work Measurement Methodology Intra-Application Interference Characterization Dynamic Mitigation of LLC Interference
System-Aware Cache Management Real-System Dynamic Prefetch Manager
Conclusion
17/23
Intra-application cache interference can also stem from hardware prefetching
1/23
Memory Controller
Com
munication B
ridge
Shared 8MB L3 Cache
IO &
QP
I
IO &
QP
I
SMT CPUCore 0
SMT CPUCore 1
SMT CPUCore 2
SMT CPUCore 3
L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$
L2$ L2$ L2$ L2$
L1 Instruction & Streamer Prefetchers
Mid-Level Cache (MLC) Spatial & Streamer Prefetchers
18/23
1/23
Intra-Application Interference Caused by Hardware Prefetching
mcf
libquantum lbm
omnetppasta
rnamd
povray
soplex
sphinx3
sjeng
bzip2 gcc
gobmk
h264ref0
0.5
1
1.5
2
2.5
3
Application LLC Misses
Mis
s Co
unts
Nor
mal
ized
to S
ys-
tem
Def
ault
[ALL
Pre
fetc
hers
On]
MLC Prefetcher OFF Less LLC Misses for libquantum and sphinx3
19/23
1/23
Dynamic Prefetch Management• Use Nehalem’s Precise Event Based Sampling (PEBS) • Sample application inst. count periodically.
time
N
Read RDTSC
t1
Read RDTSCt2
K Inst.
ON
KInst.
OFF
. . . . .
if ( t2 - t1 > t1 – t0)Turn ON MLC prefetchers;
elseTurn OFF MLC prefetchers;
t0
MLC prefetchers ON
20/23
1/23
Dynamic Management Mitigating Prefetch-Induced LLC Interference
mcf
libquan
tum lbm
omnetpp
astar
namd
povray
soplex
sphinx3
sjeng
bzip2 gc
c
gobmk
h264ref0
0.5
1
1.5
2
2.5
3Prefetchers On (System Default) Prefetchers OffDynamic Management
Appl
icati
on L
LC M
iss
Coun
ts
Nor
mal
ized
to S
yste
m D
efau
lt
Dynamic modulation of MLC prefetchers >> Static ON/OFF prefetch options.
21/23
1/23
Summary
Dynamic System-Aware Cache Management Full-system evaluation (OS effects) Performance improvement by as much as 10% (on
avg. 3%). Real-time Dynamic Prefetch Manager
Real-system implementation on Nehalem PEBS 25% LLC miss count reduction performance+,
bandwidth & energy saving
22/23
Characterization and Dynamic Mitigation of Intra-Application Cache Interference
1/23
Memory Controller
Com
munication B
ridge
Shared 8MB L3 Cache
IO &
QP
I
IO &
QP
I
SMT CPUCore 0
SMT CPUCore 1
SMT CPUCore 2
SMT CPUCore 3
L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$
L2$ L2$ L2$ L2$
Operating System
TLB Miss HandlingHW Prefetch Req.
Other OS Req.
App. 1
App. Data ld/st
*Intra-application* cache Interference from modern hardware prefetching & OS
influence app. performance significantly!
23/23
1/23
Characterization and Dynamic Mitigation of Intra-Application Cache Interference
Carole-Jean Wu and Margaret Martonosi {carolewu, mrm}@princeton.edu
2011 International Symposium on Performance Analysis on Systems and Software (ISPASS)