+ All Categories
Home > Documents > Characterization and Dynamic Mitigation of Intra-Application Cache Interference

Characterization and Dynamic Mitigation of Intra-Application Cache Interference

Date post: 31-Dec-2015
Category:
Upload: joel-pitts
View: 21 times
Download: 0 times
Share this document with a friend
Description:
2011 International Symposium on Performance Analysis on Systems and Software (ISPASS). Characterization and Dynamic Mitigation of Intra-Application Cache Interference. Carole-Jean Wu and Margaret Martonosi Princeton University 4/11/2011. Today’s CMP systems. Memory Controller. L1D$. L1I$. - PowerPoint PPT Presentation
Popular Tags:
25
Characterization and Dynamic Mitigation of Intra-Application Cache Interference Carole-Jean Wu and Margaret Martonosi Princeton University 4/11/2011 International Symposium on Performance Analysis on Systems and Software (IS 1/23
Transcript
Page 1: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Characterization and Dynamic Mitigation of Intra-Application Cache Interference

Carole-Jean Wu and Margaret Martonosi Princeton University

4/11/2011

2011 International Symposium on Performance Analysis on Systems and Software (ISPASS)

Page 2: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

Today’s CMP systems

Memory Controller

Com

munication B

ridge

Shared 8MB L3 Cache

IO &

QP

I

SMT CPUCore 0

SMT CPUCore 1

SMT CPUCore 2

SMT CPUCore 3

L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$

L2$ L2$ L2$ L2$

Operating System

IO &

QP

I

App. 1 App. 4App. 2App. 2 App. 3

1/23

Page 3: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

Within a single application, cache interference can stem from…

Memory Controller

Com

munication B

ridge

Shared 8MB L3 Cache

IO &

QP

I

IO &

QP

I

SMT CPUCore 0

SMT CPUCore 1

SMT CPUCore 2

SMT CPUCore 3

L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$

L2$ L2$ L2$ L2$

Operating System

TLB Miss HandlingHW Prefetch Req.

Other OS Req.

App. 1

App. Data ld/st

2/23

Page 4: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

Real-System LLC Miss Characterization

mcf

libquantum lbm

omnetppasta

rnamd

povray

soplex

sphinx3

sjeng

bzip2 gcc

gobmk

h264ref0%

20%

40%

60%

80%

100%

Application LLC MissesOthers [prefetching, page table walks, and etc]

Perc

enta

ge (%

) >50% of LLC misses are due to prefetching, TLB miss handling, other OS refs, etc.

3/23

Page 5: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Prior Work for Intra-Application Cache Interference

• System-induced Cache Interference– Characterization indicates significant OS/user cache interference

[Agarwal et al. TOC ’88][Torrellas et al. ASPLOS ’92]

– Reduce TLB miss handling effects [Jacob, Mudge ASPLOS ’98][Bhargava et al. ASPLOS ’08] [Barr, Cox, and Rixner ISCA ’10]

• Prefetch-induced Cache Interference– Prefetch buffer/filter

[Peir et al. ICS ’02] [Hur and Lin MICRO ’06]

– Replacement policies (Prefetch bit per cache line)[Alameldeen and Wood ISCA ’07] [Lin et al. HPCA ’01]

– Prefetching algorithms [Ebrahimi et al. MICRO ’09] [Nesbit et al. ISCA ’07] [Iacobovici et al. ICS ’04]

But all require hardware modification

4/23

Page 6: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Contributions of This Paper

1. Cache interference within an application is a problem Real-system characterization Detailed full-system simulation

2. Dynamic management mechanisms System-aware cache management Real-system, real-time prefetch manager

5/23

Page 7: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Talk Outline

Motivation and Prior Work Measurement Methodology Intra-Application Interference Characterization Dynamic Mitigation of LLC Interference

System-Aware Cache Management Real-System Dynamic Prefetch Manager

Conclusion

6/23

Page 8: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Measurement Methodology

• Real-system infrastructure– Intel Nehalem-based Core i7 (Bloomfield)– perfmon2 to access hardware PMCs

• Full-system simulation: Simics/GEMS– Simics/GEMS full system simulation

• Benchmarks– SPEC CPU2006 benchmark suite

7/23

Page 9: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

System-Mode Reference Breakdown

mcf lbm

omnetppasta

r

soplex

sphinx3

sjeng

bzip2 gcc

gobmk0%

20%40%60%80%

100%

page table walk references other system-mode references

Syst

em-M

ode

Refe

renc

e Br

eakd

own

80% of system references are due to TLB miss handling (details in the paper).

8/23

Page 10: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Memory Reuse Characteristics Analysis for User References

mcf sphinx3 sjeng bzip2 Avg.0%

20%

40%

60%

80%

100%

User-Mode References

zero-reused cache lines [baseline]zero-reused cache lines [user only]

Zero

-reu

sed

cach

elin

esUser System

System cache lines destroy good data locality of user lines when sharing the cache!

9/23

Page 11: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Memory Reuse Characteristics Analysis for System References

mcf sphinx3 sjeng bzip2 Avg.0%

20%

40%

60%

80%

100%

System-Mode References

zero reused cachelines [baseline]zero reused cache lines [system only]

Zero

-reu

sed

cach

elin

esUser System

Majority of system cache lines are not reused.Bypassing system cache lines?

10/23

Page 12: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

System-Aware Cache Management

. . . . . . . .

Refs0xEEEAMRU LRU

11/23

Page 13: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

System-Aware Cache Management

MRU LRU. . . . . . . . MID

Refs

0xEEEA0XDFAE0X12340XDADA 0XEEAF

0X001A

MRU LRU

12/23

Page 14: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

System-Aware Cache Management

MRU LRU. . . . . . . . MID 0xEEEA0XDFAE0X12340XDADA 0XEEAF….

system

Refsuser

SYS-LRUinsert

MRU LRU

13/23

Page 15: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

System-Aware Cache Management

MRU LRU. . . . . . . . MID 0xEEEA0X12340XDADA 0XEEAF….

system

Refsuser

SYS-MIDinsert

MRU LRU

14/23

Page 16: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

System-Aware Cache Management

MRU LRU. . . . . . . . MID 0xBEEF0X12340XDADA 0XEEAF….

system

Refsuser

SYS-DYNAMIC

MRU LRU

*Set sampling: DIP [Qureshi et al. ISCA ‘07]

15/23

Page 17: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

IPC Performance Improvement

mcf lbm

omnetppasta

r

soplex

sphinx3

sjeng

bzip2 gcc

gobmkAvg

.0.80.9

11.11.21.3

SYS-LRUinsert SYS-MIDinsert SYS-DYNAMIC

Aggr

. IPC

Nor

mal

ized

to B

asel

ine

(Hig

her i

s Be

tter

)

SYS-DYNAMIC improves performance for ALL applications by as much as 10% (avg. of 3%).

16/23

Page 18: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Talk Outline

Motivation and Prior Work Measurement Methodology Intra-Application Interference Characterization Dynamic Mitigation of LLC Interference

System-Aware Cache Management Real-System Dynamic Prefetch Manager

Conclusion

17/23

Page 19: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

Intra-application cache interference can also stem from hardware prefetching

1/23

Memory Controller

Com

munication B

ridge

Shared 8MB L3 Cache

IO &

QP

I

IO &

QP

I

SMT CPUCore 0

SMT CPUCore 1

SMT CPUCore 2

SMT CPUCore 3

L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$

L2$ L2$ L2$ L2$

L1 Instruction & Streamer Prefetchers

Mid-Level Cache (MLC) Spatial & Streamer Prefetchers

18/23

Page 20: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Intra-Application Interference Caused by Hardware Prefetching

mcf

libquantum lbm

omnetppasta

rnamd

povray

soplex

sphinx3

sjeng

bzip2 gcc

gobmk

h264ref0

0.5

1

1.5

2

2.5

3

Application LLC Misses

Mis

s Co

unts

Nor

mal

ized

to S

ys-

tem

Def

ault

[ALL

Pre

fetc

hers

On]

MLC Prefetcher OFF Less LLC Misses for libquantum and sphinx3

19/23

Page 21: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Dynamic Prefetch Management• Use Nehalem’s Precise Event Based Sampling (PEBS) • Sample application inst. count periodically.

time

N

Read RDTSC

t1

Read RDTSCt2

K Inst.

ON

KInst.

OFF

. . . . .

if ( t2 - t1 > t1 – t0)Turn ON MLC prefetchers;

elseTurn OFF MLC prefetchers;

t0

MLC prefetchers ON

20/23

Page 22: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Dynamic Management Mitigating Prefetch-Induced LLC Interference

mcf

libquan

tum lbm

omnetpp

astar

namd

povray

soplex

sphinx3

sjeng

bzip2 gc

c

gobmk

h264ref0

0.5

1

1.5

2

2.5

3Prefetchers On (System Default) Prefetchers OffDynamic Management

Appl

icati

on L

LC M

iss

Coun

ts

Nor

mal

ized

to S

yste

m D

efau

lt

Dynamic modulation of MLC prefetchers >> Static ON/OFF prefetch options.

21/23

Page 23: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Summary

Dynamic System-Aware Cache Management Full-system evaluation (OS effects) Performance improvement by as much as 10% (on

avg. 3%). Real-time Dynamic Prefetch Manager

Real-system implementation on Nehalem PEBS 25% LLC miss count reduction performance+,

bandwidth & energy saving

22/23

Page 24: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Memory Controller

Com

munication B

ridge

Shared 8MB L3 Cache

IO &

QP

I

IO &

QP

I

SMT CPUCore 0

SMT CPUCore 1

SMT CPUCore 2

SMT CPUCore 3

L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$

L2$ L2$ L2$ L2$

Operating System

TLB Miss HandlingHW Prefetch Req.

Other OS Req.

App. 1

App. Data ld/st

*Intra-application* cache Interference from modern hardware prefetching & OS

influence app. performance significantly!

23/23

Page 25: Characterization and Dynamic Mitigation of Intra-Application Cache Interference

1/23

Characterization and Dynamic Mitigation of Intra-Application Cache Interference

Carole-Jean Wu and Margaret Martonosi {carolewu, mrm}@princeton.edu

2011 International Symposium on Performance Analysis on Systems and Software (ISPASS)


Recommended