+ All Categories
Home > Documents > Performance Predictability Carole’s Group Talk on 5-13-2009.

Performance Predictability Carole’s Group Talk on 5-13-2009.

Date post: 19-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
Performance Predictability Carole’s Group Talk on 5-13-2009
Transcript
Page 1: Performance Predictability Carole’s Group Talk on 5-13-2009.

Performance Predictability

Carole’s Group Talk on 5-13-2009

Page 2: Performance Predictability Carole’s Group Talk on 5-13-2009.

What if hmmer is the high-priority application?

bzip2 gcc

gobmk

hmmerlbm mcf

milc

perlbench

sjeng

0.50.55

0.60.65

0.70.75

0.80.85

0.90.95

1

9 Heterogeneous Applications

un-managedalone

Nor

mal

ized

IPC

over

the

Appl

ica-

tion

Runn

ing

Alon

e

Page 3: Performance Predictability Carole’s Group Talk on 5-13-2009.

Can we predict performance trends?

Number of Instructions

Perf

orm

ance

Performance upper bound of the high-priority application

Performance of the high-priority (HP) applicationwhile running with other non-HP applications

???

Page 4: Performance Predictability Carole’s Group Talk on 5-13-2009.

Can we predict performance trends?

Number of Instructions

Perf

orm

ance

Performance upper bound of the high-priority application

Performance of the HP applicationrunning with other non-HP applications

???

Performance improvement due to the new dynamic resource allocation setting

Page 5: Performance Predictability Carole’s Group Talk on 5-13-2009.

If we can predict accurately the performance trend of hmmer by determining the degree of

inter-application inference at run time, we can act early!

Page 6: Performance Predictability Carole’s Group Talk on 5-13-2009.

Studies have shown there exist a high correlation between application performance and the number of

long-latency cache misses.

We can use cache usages (e.g. LLCache) to identify the degree of inter-application interference at run time, we can predict performance for applications of interest.

Page 7: Performance Predictability Carole’s Group Talk on 5-13-2009.

Outline

Motivation Observation Cache & Observation Sets Experimental Setup Results Conclusion

Page 8: Performance Predictability Carole’s Group Talk on 5-13-2009.

How do we identify inter-application conflict misses?

E F G H

.

.

.

.

.

.

.

.

.

.

.

.

A B C D

.

.

.

.

.

.

.

.

.

.

.

.memory references

LRU MRU LRU MRU

I

A, B, C, D, I -- HP references E, F, G, H -- non-HP references

Observation Cache

Page 9: Performance Predictability Carole’s Group Talk on 5-13-2009.

How do we identify inter-application conflict misses?

F G H I

.

.

.

.

.

.

.

.

.

.

.

.

A B C D

.

.

.

.

.

.

.

.

.

.

.

.memory references

LRU MRU LRU MRU

A

Is A an inter-application conflict miss? No!

A, B, C, D, I -- HP references E, F, G, H -- non-HP references

Observation Cache

Page 10: Performance Predictability Carole’s Group Talk on 5-13-2009.

How do we identify inter-application conflict misses?

F G H I

.

.

.

.

.

.

.

.

.

.

.

.

A B C D

.

.

.

.

.

.

.

.

.

.

.

.memory references

LRU MRU LRU MRU

# of non-HP cache lines

.

.

.

.

.

.

.

.

.

.

.

.

3

if( (set_assoc – lru_hit_cnt) <= #_non_HP_lines ){ inter_app miss++;de_allocate( address );

}else non_inter_app_miss++;

A, B, C, D, I -- HP references E, F, G, H -- non-HP references

0 set_assoc-1

Observation Cache

Page 11: Performance Predictability Carole’s Group Talk on 5-13-2009.

Do we really need all these crazy hardware?

• 4MB observation cache + 4096 32-bit counters (for each cache set)

Hopefully not!!

Page 12: Performance Predictability Carole’s Group Talk on 5-13-2009.

Approach 1: Dynamic profiling within way-partitioning infrastructure

Given a way-partitioned 16-way set-associative cache, of which n ways (n<8) are dedicated to a HP-app. Use the n ways for HP-app’s exclusive access as

the observation cache to measure the degree of inter-application interference and the rest of (16-n) ways as the normal shared cache.

Page 13: Performance Predictability Carole’s Group Talk on 5-13-2009.

Approach 1: Dynamic profiling within way-partitioning infrastructure

E F G I

.

.

.

.

.

.

.

.

.

.

.

.

C B A

.

.

.

.

.

.

.

.

.

.

.

.memory references

MRU LRU

.

.

.

.

.

.

.

.

.

.

.

.

4H

Observation Cache

Page 14: Performance Predictability Carole’s Group Talk on 5-13-2009.

Approach 2: Set sampling

There are 4096 sets in a 16-way set-associative, 4MB cache with 64-byte lines Spare a small part of the entire cache as “leader sets” and

use the rest as “follower sets” [Utility-based Cache Partition, Qureshi and Patt, MICRO ‘06]

Page 15: Performance Predictability Carole’s Group Talk on 5-13-2009.

Approach 2: Set sampling version1

memory references

E9 F4 G3 I2 A0 B5 H8 D9

.

.

.

.

.

.

E1 E2 G8 P1 E4 F6 G1 P5

P1 P5 P7 P9 P2 P3 P8 P0

sam

e se

t

m1

m1’

The number of inter-application conflict miss is (m1’-m1) for the high-priority application!

MRU LRU

Pi -- HP references Ei, Gi-- non-HP references

Page 16: Performance Predictability Carole’s Group Talk on 5-13-2009.

Approach 2: Set sampling version2

E F G Imemory references

MRU LRU

6

A B H D

.

.

.

.

.

.

E1 E2 G8 P4 E4 F6 G1 P6

P1 P5 P7 P9 P2 P3 P8 P0

Pi -- HP references Ei, Gi-- non-HP references

# of non-HP cache lines

Exte

nded

Se

t

Observation Set

Page 17: Performance Predictability Carole’s Group Talk on 5-13-2009.

Outline

Motivation Observation Cache & Observation Sets Experimental Setup Results Conclusion

Page 18: Performance Predictability Carole’s Group Talk on 5-13-2009.

Experimental Setup

• GEMS: Simics (in-order) + Ruby (memory module)

• 8-core CMP• Private L1 cache per core (32KB, 4-way set-assoc., 64-

byte cache lines)• Shared L2 cache (4MB, 16-way set-assoc., 64-byte

cache lines) “4096 cache sets”

• 4 SPEC2006 applications in the workload• bzip2, mcf, gobmk, and hmmer [10 billion cycles]

Page 19: Performance Predictability Carole’s Group Talk on 5-13-2009.

Miss Identification for Applications in the Workload

0200000400000600000

bzip2 (bmgh)

InterApp_Misses NonInterApp_Misses

Time (cycles)Num

ber o

f Mis

ses

0100000200000300000400000500000

mcf (bmgh)

InterApp_Misses NonInterApp_Misses

Time (cycles)

Num

ber o

f Mis

ses

0200000400000600000

gobmk (bmgh)

InterApp_Misses NonInterApp_Misses

Time (cycles)Num

ber o

f Mis

ses

050000

100000150000200000

hmmer (bmgh)

InterApp_Misses NonInterApp_Misses

Time (cycles)

Num

ber o

f Mis

ses

Page 20: Performance Predictability Carole’s Group Talk on 5-13-2009.

How well do observation sets perform [%10]?

0 5000000000 100000000000

50000100000150000200000250000

020406080100

bzip2

InterApp_MissesInterApp_Miss (Observation Sets)Error Rate (%)

Time (cycles)

Num

ber o

f Mis

ses

Erro

r Rat

e (%

)0 5000000000 10000000000

050000

100000150000200000250000

020406080100

mcf

InterApp_MissesInterApp_Miss (Observation Sets)Error Rate (%)

Time (cycles)

Num

ber o

f Mis

ses

Erro

r Rat

e (%

)

Page 21: Performance Predictability Carole’s Group Talk on 5-13-2009.

How well do observation sets perform [%10]?

0 5000000000 100000000000

50000100000150000200000250000

020406080100

gobmk

InterApp_MissesInterApp_Miss (Observation Sets)Error Rate (%)

Time (cycles)

Num

ber o

f Mis

ses

Erro

r Rat

e (%

)0 5000000000 10000000000

0

50000

100000

150000

200000

020406080100

hmmer

InterApp_MissesInterApp_Miss (Observation Sets)Error Rate (%)

Time (cycles)

Num

ber o

f Mis

ses

Erro

r Rat

e (%

)

Page 22: Performance Predictability Carole’s Group Talk on 5-13-2009.

# of observation sets vs. prediction accuracy

bzip2 mcf gobmk hmmer average0

1

2

3

4

5

6

7

8

9

10

Precision for Observation Sets

[%10][%40][%64][%67][%100]

Erro

r Rat

e (%

) 10% of the 4MB cache as observation sets offers >99% prediction accuracy

1% of the 4MB cache as observation sets offers >97% prediction accuracy

Page 23: Performance Predictability Carole’s Group Talk on 5-13-2009.

Can we predict accurately the performance trend by determining the degree of inter-

application inference at run time?

Hopefully, I have convinced you that it is possible!

Page 24: Performance Predictability Carole’s Group Talk on 5-13-2009.

Conclusion

Dynamic profiling is powerful Feasible – no pre-run (static profiling) ever!

Accurate run-time prediction for performance trends provides Performance predictability Quality of service Better resource allocation decisions

With 10% of the 4MB cache as observation sets, we can achieve >99% prediction accuracy. 1% of the 4MB cache >97% prediction accuracy

Page 25: Performance Predictability Carole’s Group Talk on 5-13-2009.

Still awake?

Thank you

Next…

Page 26: Performance Predictability Carole’s Group Talk on 5-13-2009.

Summer internship with Google on power usage prediction algorithms for

Google’s data centers. Welcome to visit me in Mountain View, CA !!!

Page 27: Performance Predictability Carole’s Group Talk on 5-13-2009.

Cache Occupancy per Application

0

10

20

30

40

50

60

70

80

90

100

Cache Occupancy per Application (bmgh)

OS activitieshmmergobmkmcfbzip2

Time (cycles)

Cach

e O

ccup

ancy

(%)


Recommended