+ All Categories
Home > Documents > Understanding Cache Hierarchy Contention in CMPs to...

Understanding Cache Hierarchy Contention in CMPs to...

Date post: 26-Mar-2018
Category:
Upload: dodien
View: 226 times
Download: 2 times
Share this document with a friend
68
Understanding Cache Hierarchy Contention in CMPs to Improve Job Scheduling J. Feliu, Julio Sahuquillo, S. Petit and J. Duato Universitat Politècnica de València Spain
Transcript
Page 1: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Understanding Cache HierarchyContention in CMPs to Improve Job

Scheduling

J. Feliu, Julio Sahuquillo, S. Petit and J. Duato

Universitat Politècnica de ValènciaSpain

Page 2: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Outline

• Introduction

• Experimental platform

• Benchmark characterization and performance degradation analysis

• Cache-hierarchy bandwidth aware scheduler

• Methodology and evaluation

• Conclusions

2

Page 3: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Introduction

• Multi-core processors havebecome the commonimplementation for high-performance microprocessors.

• CMPs main performancebottleneck lies in the mainmemory latency.

3

Page 4: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Introduction

• As the number of cores andmultithreading capabilitiesincrease, the availablememory bandwidth isbecoming a major concern.

4

Page 5: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Introduction

• As the number of cores andmultithreading capabilitiesincrease, the availablememory bandwidth isbecoming a major concern.

5

Page 6: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Introduction

• Memory bandwidth awareschedulers can help toreduce memory contentionby avoiding the concurrentexecution of memory-hungryapplications.

6

Page 7: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Introduction

What about othercontention points?

7

Page 8: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

IntroductionMain contributions

• Two main contributions:

– Characterize the sensitiveness of the SPECCPU2006 benchmarks to each contention point inthe memory hierarchy of a quad-core Intel Xeonwhich claims the necessity of the proposal.

– Propose a scheduling approach for multi-coreprocessors with shared caches to improve theperformance.

8

Page 9: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Experimental platformSpecifications

CPU Intel Xeon X3320

Frequency 2.5 GHz

Number of cores 4

Multithreading No

L1 cache Code L1: 4 x 32 KBData L1: 4 x 32 KB

L2 cache 2 x 3 MB

Main memory 4 GB DDR2

Operating system Fedora Core 10 Linux

Kernel 2.6.29 with perfmon patch

Software pfmon, libpfm

Benchmarks Spec CPU2006 with train input

Hardware specifications Software specifications

9

Page 10: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Experimental platformPerformance counters

• A set of special-purpose registers built into modern processors.

• Store the counts of hardware-related activities within computersystems.

• Keep track of the events in a per process basis.

Monitored event Information

UNHALTED_CORE_CYCLES Cycles

INSTRUCTIONS_RETIRED Instructions

L2_RQSTS:MESI L1 misses

LAST_LEVEL_CACHE_MISSES L2 misses

10

Page 11: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Experimental platformIntel Xeon X3320

Xeon X3320 memory hierarchy 11

Page 12: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Experimental platformIntel Xeon X3320

Contention points related to the memory subsystem in the Xeon X332012

Page 13: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Cache hierarchy in the IBM Power 5

Power 5 memory hierarchy

13

Page 14: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Cache hierarchy in the IBM Power 5

Contention points related to the memory subsystem in the IBM Power 514

Page 15: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Cache hierarchy in the IBM Power 5

Contention points related to the memory subsystem in the IBM Power 5

The more contention points, the more performance enhancement is expected.

15

Page 16: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Benchmark characterization and performance degradation analysis

• Benchmark characterization– Classify the benchmarks as

memory-bounded or L2-bounded.

– Build “interesting” mixes.

• Estimation of the performance degradationdue to main memory and L2 contention– Degradation over 60% due to main memory and

around 13% due to L2 contention.

– Motivate the work.

16

Page 17: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Benchmark characterizationL1 MPKI & L2 MPKI

17

Page 18: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Benchmark characterizationL1 MPKI & L2 MPKI

18

Page 19: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisMicrobenchmark

• Mimic the behavior of bothmemory-bounded and L2-bounded.

• Setting the CACHE_LINE_SIZEand N parameters according tothe target cache, themicrobenchmark can miss inthat cache, hitting in the lowerlevels.

• In the Intel Xeon X3320:– Cache line size: 256 bytes (64

integers)– L2-bounded: N= 128– Memory-bounded: N = 12288

19

Page 20: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to memory contention (I)

• Commonly, the lower the MPKI of the benchmark, the lower IPC degradation.

20

Page 21: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to memory contention (I)

• Commonly, the lower the MPKI of the benchmark, the lower IPC degradation.• Performance degradation is over 50% in some benchmarks and high MPKI of theco-runners.

21

Page 22: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to memory contention (I)

• Commonly, the lower the MPKI of the benchmark, the lower IPC degradation.• Performance degradation is over 50% in some benchmarks and high MPKI of theco-runners.• A few benchmarks are poorly affected by contention.

22

Page 23: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to memory contention (I)

• Commonly, the lower the MPKI of the benchmark, the lower IPC degradation.• Performance degradation over 50% in some benchmarks and high MPKI of the co-runners.• A few benchmarks are poorly affected by contention.• Performance degradation is over 20% in most benchmarks and different MPKI ofthe co-runners.

23

Page 24: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to memory contention (II)

mem-b in 1 mem-b in 2 mem-b in 1+2 mem-b in 1+2+3

Four scenarios are analyzed:

24

Page 25: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to memory contention (II)

• Some benchmarks suffer higher IPC degradation when the co-runner runs inthe other bi-core, since memory is more frequently accessed.

25

Page 26: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to memory contention (II)

• Some benchmarks suffer higher IPC degradation when the co-runner runs inthe other bi-core, since memory is more frequently accessed.• Other benchmarks suffer higher IPC degradation when the co-runner runs inthe same bi-core. This can be caused by L2 cache conflicts or L2 bandwidth.

26

Page 27: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to memory contention (II)

• Some benchmarks suffer higher IPC degradation when the co-runner runs inthe other bi-core since memory is more frequently accessed.• Other benchmarks suffer higher IPC degradation when the co-runner runs inthe same bi-core. This can be caused by L2 cache conflicts or L2 bandwidth.• In the common case, both degradations are similar.

27

Page 28: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to L2 contention

• Only the benchmark and one co-runner are involved.

28

Page 29: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to L2 contention

• Only the benchmark and one co-runner are involved.• Three benchmarks present high IPC degradation with an L2-bounded co-runner over 10%.

29

Page 30: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to L2 contention

• Only the benchmark and one co-runner are involved.• Three benchmarks present high IPC degradation with an L2-bounded co-runner over 10%.• About half of the benchmarks present an IPC degradation close (or over) 5% due to L2 bandwidth.

30

Page 31: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to L2 contention

Although this degradation is lower than the caused by main memorycontention, since the trend is to increase the number of cores and sharedcaches we claim the necessity of a cache-hierarchy bandwidth awarescheduling and not only memory aware scheduling.

31

Page 32: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Cache-hierarchy memory aware scheduling

• Addresses the target bandwidthat each contention point.

• Schedules the processes in nsteps (as many as cache levels).

• Top-down approach: from theMM to the L1 cache.

– Final step allocates theprocesses to cores.

• Inputs: for each process its

execution time and BTR MM.

32

Page 33: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Cache-hierarchy memory aware scheduling

• When a quantum expires…

• Update the BTR values in each cachelevel for each executed process.

• Use these values as predicted BTR forthe next quantum.

• BTR values are only updated ifcontention there is no highcontention. Otherwise, BTR valuesare not updated.

33

Page 34: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Cache-hierarchy memory aware scheduling

• BW Remain is set to the totalnumber of memory requestsdivided by the total executiontime of the processes in stand-alone execution.

• Unfinished jobs are kept in asoftware queue structure.

• The process at the queue head isalways selected to avoid processstarvation.

• The selected processes areinserted at the queue tail whenthe quantum finishes.

34

Page 35: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Cache-hierarchy memory aware scheduling

• Then, the scheduler selects theremaining c minus 1 processes thatmaximize the Fitness function*.

– That estimates the gap between theBTRRemain and the predicted BTR ofeach process.

• BWRemain and CPURemain (# of cores)are updated each time a process isselected.

• The result of this step is the list ofprocesses to be executed consideringtaking into account the MMbandwidth constraint.

* From D. Xu, C. Wu and p.-C. Yew, “On mitigating memory bandwidth contention through bandwidth-aware scheduling”, in PACT 2010

35

Page 36: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Cache-hierarchy memory aware scheduling

• For each level in the cache hierarchy with shared caches:– AVG_BTR is set to the average

BTR of the selected processes divided by the number of cache structures.

– BWremain is set to AVG_BTR for each cache and the processes are selected using the Fitness function, updating the BW remain and CPU remain.

– The iteration in the last shared cache level allocates the processes to the concrete cores in its cache structure.

36

Page 37: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

37

Page 38: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

38

Page 39: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

39

Page 40: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

40

Page 41: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

41

Page 42: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

42

Page 43: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

43

Page 44: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

44

Page 45: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

45

Page 46: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

46

Page 47: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

47

Page 48: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

48

Page 49: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

49

Page 50: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

50

Page 51: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

51

Page 52: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

52

Page 53: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Example

53

Page 54: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Evaluation methodology

• Evaluation is performed in the experimental platform.

• Implement the proposal in a user level scheduler (in a real machine)

– At the end of each quantum, the scheduler uses:

• PTRACE_ATTACH to block the execution of the processes.

• PTRACE_DETACH to unblock the execution of the processes.

• Sched_setaffinity to allocate processes in cores.

– To evaluate the performance, a set of 10 mixes with eight benchmarks was designed.

54

Page 55: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Evaluation methodology

• The performance of the proposal is evaluated against:

– Memory-aware scheduler *.

– Linux OS scheduler.

• The schedulers differ in the selection process:

– Memory-aware scheduler selects proper processes but donot allocate them to cores.

– Cache-hierarchy scheduler selects the processes andallocates them to cores.

* D. Xu, C. Wu, and P.-C. Yew, “On mitigating memory bandwidth contention through bandwidth-aware scheduling”, in PACT 2010

55

Page 56: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Scheduler performanceSpeedup

Speedup over native Linux OS

56

Page 57: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Scheduler performanceBTR balancing: histogram

BTR differences between the L2 shared caches57

Page 58: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Scheduler performanceBTR balancing: average

Average and variance of the difference between the BTRs of the L2 caches

58

Page 59: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Scheduler performanceBTR L2 difference evolution

BTR L2 difference evolution time in mix 2

59

Page 60: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Scheduler performanceBTR L2 difference evolution

BTR L2 difference evolution time in mix 2

60

Page 61: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Scheduler performanceBTR L2 difference evolution

BTR L2 difference evolution time in mix 2

61

Page 62: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Scheduler performanceBTR balancing on mix 2

BTR L2 difference in the first 160 quanta

62

Page 63: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Conclusions

• Performance can drop due to bandwidth contentionlocated at different levels of the memory hierarchy.

• The current processor industry trend increases thenumber of contentions points.

• Memory aware bandwidth jobs only attack mainmemory contention point.

• Cache-hierarchy bandwidth aware policy:– Attacks all the contention points of the cache hierarchy.

– Increases the performance of the evaluated mixes 30%respect to the memory bandwidth aware scheduling.

63

Page 64: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

• Thank you very much for your attention

• Questions?

64

Page 65: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Understanding Cache HierarchyContention in CMPs to Improve Job

Scheduling

J. Feliu, Julio Sahuquillo, S. Petit and J. Duato

Universitat Politècnica de ValènciaSpain

Page 66: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Evaluation methodology

• To deal with the different execution time of thebenchmarks, a benchmark execution is set to thenumber of instructions required to achieve a executiontime of 120 seconds in stand alone execution.

• Otherwise, a long job first policy would provide thebest performance in most mixes.

• The number of complete executions and instructions ofthe last execution is measured and recorded offline.

• If the execution time of the benchmarks is larger, thescheduler kills it when the target instructions areexecuted. If it is lower, the scheduler re-execute thebenchmarks several times.

66

Page 67: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Evaluation methodology

• To evaluate the performance, a set of 10 mixes witheight benchmarks was designed.

• Mixes present an ideal bandwidth (IABW) falling inbetween 20 and 40 trans/usec.

– Lower IABWs detract the necessity of a memory-awarescheduler since contention is low.

– Higher IABWs cannot take advantage of memory-awarescheduling since all the scheduling possibilities reach highcontention.

67

Page 68: Understanding Cache Hierarchy Contention in CMPs to ...personales.upv.es/jofepre/docs/IPDPS_2012_slides.pdf · degradation analysis •Cache-hierarchy bandwidth aware scheduler ...

Performance degradation analysisDegradation due to memory contention (II)

• Some benchmarks suffer higher IPC degradation when the co-runner runs inthe other bi-core since memory is more frequently accessed.• Other benchmarks suffer higher IPC degradation when the co-runner runs inthe same bi-core. This can be caused by L2 cache conflicts or L2 bandwidth.• In the common case, both degradations are similar.• The IPC degradation difference is lower from 1 to 2 co-runners than from 2to 3 co-runners, since 2 co-runners are close to saturate the bandwidth.

68


Recommended