+ All Categories
Home > Documents > SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors...

SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors...

Date post: 28-Dec-2015
Category:
Upload: diana-holland
View: 216 times
Download: 0 times
Share this document with a friend
24
SYNAR Systems Networking and Architecture Group SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra Fedorova School of Computing Science, Simon Fraser University, Vancouver, Canada
Transcript

SYNAR Systems Networking and Architecture GroupSYNAR Systems Networking and Architecture Group

Scheduling on Heterogeneous Multicore Processors Using

Architectural Signatures

Daniel Shelepov and Alexandra FedorovaSchool of Computing Science,

Simon Fraser University,Vancouver, Canada

SYNAR Systems Networking and Architecture GroupArchitectural Signatures in a NutshellTask:

to schedule jobs appropriately given a variety of different cores available

Caveats: Scheduler doesn’t know job behaviour a prioriScalability: hundreds of cores potentially available

Our approach: Analyze job performance offlineDescribe findings in a job’s architectural signatureScheduler uses signatures to make intelligent core assignment decisions

SYNAR Systems Networking and Architecture Group

Talk OutlineBackgroundMethodologyResultsSummary and Future Work

SYNAR Systems Networking and Architecture Group

Background: Heterogeneous CPUs

Heterogeneous CPUs = several types of cores:

Simple vs. Complex: cache size, issue width, presence of advanced features, power consumptionSpecialized (possibly) Example: many FPUs

Expose a common ISAMay contain 100s or 1000s of cores (“manycore”)Bottom line: better efficiency = saved power

Future: heterogeneous multi- and manycore CPUs

Now: homogeneous multicore CPUs

Complex Simple SpecializedCores:

SYNAR Systems Networking and Architecture Group

Background: Heterogeneous Scheduling

Scheduler needs to be aware of:

underlying core featuresjob performance on various cores

Otherwise, no informed scheduling decision can be made => no benefit from heterogeneity

Scheduler

?

SYNAR Systems Networking and Architecture Group

Architectural Signature Approach

A signature is provided along with the job binary.

Signaturesare constructed offlineare μarch.-independentprovide guidance for selecting appropriate cores

Scheduler

ü

SYNAR Systems Networking and Architecture Group

Talk OutlineBackgroundMethodologyResultsSummary and Future Work

SYNAR Systems Networking and Architecture Group

Constructing Signatures

OFFLINE ANALYSIS

Generate performance-predicting metrics that a scheduler is able to use

Examples: optimal cache size, inherent ILP, clock speed sensitivity

PREDICTION MODEL

Create a model for generating meaningful performance-predicting metrics from collected profiling data

SCHEDULING

Interpret performance-predicting metrics and schedule

OFFLINE PROFILING

Collect microarchitecture-independent profiling data

Examples: instruction mix, memory access patterns

SYNAR Systems Networking and Architecture Group

Case Study: Clock Speed Sensitivity

Frequency changes affect different jobs differently.

Clock speed sensitivity is the means to capture these differences.

0

0.25

0.5

0.75

1

1.25

1.5

1.75

3GHz 2.67GHz 2.33GHz 2GHz

core frequency

no

rma

lize

d c

om

ple

tio

n t

ime

swimeon

Completion time at different clock speeds

SYNAR Systems Networking and Architecture Group

Offline Profiling

We use MICA, a custom toolkit for Pin by Hoste and Eeckhout [2] (http://trappist.elis.ugent.be/~kehoste/MICA/).MICA gathers a variety of μarch.-independent metrics.For clock speed sensitivity, we want reuse distance data.

SYNAR Systems Networking and Architecture Group

Offline Analysis

Reuse distances are used to estimate abstract L2 cache miss rates.L2 cache miss rates are used to estimate clock speed elasticity, a metric that puts a number on sensitivity.

requires a prediction model for elasticity as function of cache miss rate (see next slide)

Elasticity values are placed into the architectural signature.

SYNAR Systems Networking and Architecture Group

Prediction Model

•The graph shows a mapping of SPEC CPU benchmarks displaying estimated L2 miss rates and clock speed elasticity

•We build a linear model and then use it to predict elasticity during offline analysis

-1.1

-0.9

-0.7

-0.5

-0.3

-0.1

0 5 10 15 20

L2 miss rate, per 1000 inst.

clo

ck f

req

uen

cy e

last

icit

y

• Constructed once, it can be used for all future analysis, unless a better model is proposed

Mo

re s

ensi

tive

Le

ss s

en

sitiv

e

SYNAR Systems Networking and Architecture Group

SchedulingRecall: the architectural signature contains elasticity valuesElasticity is straightforward to interpretUsing elasticity, the scheduler categorizes jobs into: highly, moderately and insensitiveFinally, we’re ready to schedule

SYNAR Systems Networking and Architecture Group

Clock Speed Sensitivity Data Flow

MICA reuse distance data

abstract L2 cache miss rates

clock speed elasticity values

clock speed sensitivity category

SYNAR Systems Networking and Architecture Group

Talk OutlineBackgroundMethodologyResultsSummary and Future Work

SYNAR Systems Networking and Architecture Group

Evaluating Clock Speed Sensitive Scheduling

Completion times with our clock speed aware prototype normalized to completion times with the default Linux 2.6.18 scheduler

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

w upw ise mgrid apsi facerec geometricmean

rela

tive

ch

ang

e in

co

mp

leti

on

tim

e

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

gcc

gap

eon

fma3

d

mcf

equa

ke

wup

wis

e

luca

s

geom

etric

mea

n

rela

tive

ch

ang

e in

co

mp

leti

on

tim

e

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

eon crafty mcf equake geometricmean

rela

tive

ch

ang

e in

co

mp

leti

on

tim

e

Highly heterogeneous workload. Two 2GHz

cores, two 3GHz cores

Balanced workload. One of each of 2GHz, 2.33GHz,

2.67GHz, 3GHz cores

Uniform workload. Two 2GHz cores, two 3GHz

cores.

SYNAR Systems Networking and Architecture Group

Talk OutlineBackgroundMethodologyResultsSummary and Future Work

SYNAR Systems Networking and Architecture Group

Summary

A framework for developing microarchitecture-independent architectural signatures to assist heterogeneity-aware schedulingProof of concept: clock speed aware schedulingResults: tangible benefits even on mildly heterogeneous platforms

up to 4% average throughput increase on a multicore system with 2GHz and 3GHz cores

SYNAR Systems Networking and Architecture Group

Future WorkExtend our framework to include other core characteristics (cache size, issue width,..)Develop and analyze a heterogeneity-aware scheduler in a real operating system (Sun Solaris)Compare that scheduler with other heterogeneity-aware schedulers

SYNAR Systems Networking and Architecture Group

References

[1] M. Becchi and P. Crowley. Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures. In Proceedings of the Conference on Computing Frontiers, 2006[2] K. Hoste and L. Eeckhout. Microarchitecture-Independent Workload Characterization. IEEE Micro Hot Tutorials, 27(3):63-72, 2007.[3] R. Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, N. Jouppi, and K. Farkas. Single-ISA Heterogeneous Multicore Architectures for Multithreaded Workload Performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004

SYNAR Systems Networking and Architecture Group

Appendix A: Existing Approaches

Algorithms by Becchi [1] and Kumar [3]These rely on performance monitoring to determine optimal assignment.Potential drawbacks:

don’t scale well to many types of coreslimited applicability to short-lived threads

Scheduler

ü

SYNAR Systems Networking and Architecture Group

Appendix B: Inputs Sets and PerformanceVarying input sets can drastically affect performance

ref vs. test input in SPEC CPU2000

One architectural signature can provide for at most one inputDifficult problem that we are not currently tacklingThere are smart ways to create parameterized approximations that account for data input size:

Y. Zhong, S. G. Dropsho and C. Ding. Miss rate prediction across all program inputs. In Proceedings of Parallel Architechtures and Compilation Techniques, 2003.

SYNAR Systems Networking and Architecture Group

Appendix C: ElasticityWe need two measurements of completion time at two different frequenciesThen we calculate clock speed elasticity of completion time as follows (E = Elasticity, T = Completion time, F = clock speed):

The larger the magnitude, the more sensitive is the completion time to clock speedIn this case, -1.0 is considered very elastic (sensitive), because it means that an increase in frequency by a factor of X will decrease the completion time by the same factor.

21

21

12

12, *

TT

FF

FF

TTE FT

SYNAR Systems Networking and Architecture Group

Appendix D: Different Cache SizesL2 miss rates (and elasticity) depend heavily on cache size => it has to be taken into accountSolution: calculate miss rates and elasticity for common cache configurations, the scheduler picks appropriateReasonable approach, because cache size aware scheduling takes precedence before clock speed aware scheduling


Recommended