SYNAR Systems Networking and Architecture GroupSYNAR Systems Networking and Architecture Group
Scheduling on Heterogeneous Multicore Processors Using
Architectural Signatures
Daniel Shelepov and Alexandra FedorovaSchool of Computing Science,
Simon Fraser University,Vancouver, Canada
SYNAR Systems Networking and Architecture GroupArchitectural Signatures in a NutshellTask:
to schedule jobs appropriately given a variety of different cores available
Caveats: Scheduler doesn’t know job behaviour a prioriScalability: hundreds of cores potentially available
Our approach: Analyze job performance offlineDescribe findings in a job’s architectural signatureScheduler uses signatures to make intelligent core assignment decisions
SYNAR Systems Networking and Architecture Group
Talk OutlineBackgroundMethodologyResultsSummary and Future Work
SYNAR Systems Networking and Architecture Group
Background: Heterogeneous CPUs
Heterogeneous CPUs = several types of cores:
Simple vs. Complex: cache size, issue width, presence of advanced features, power consumptionSpecialized (possibly) Example: many FPUs
Expose a common ISAMay contain 100s or 1000s of cores (“manycore”)Bottom line: better efficiency = saved power
Future: heterogeneous multi- and manycore CPUs
Now: homogeneous multicore CPUs
Complex Simple SpecializedCores:
SYNAR Systems Networking and Architecture Group
Background: Heterogeneous Scheduling
Scheduler needs to be aware of:
underlying core featuresjob performance on various cores
Otherwise, no informed scheduling decision can be made => no benefit from heterogeneity
Scheduler
?
SYNAR Systems Networking and Architecture Group
Architectural Signature Approach
A signature is provided along with the job binary.
Signaturesare constructed offlineare μarch.-independentprovide guidance for selecting appropriate cores
Scheduler
ü
SYNAR Systems Networking and Architecture Group
Talk OutlineBackgroundMethodologyResultsSummary and Future Work
SYNAR Systems Networking and Architecture Group
Constructing Signatures
OFFLINE ANALYSISGenerate performance-predicting metrics that a scheduler is able to use
Examples: optimal cache size, inherent ILP, clock speed sensitivity
PREDICTION MODELCreate a model for generating meaningful performance-predicting metrics from collected profiling data
SCHEDULINGInterpret performance-predicting metrics and schedule
OFFLINE PROFILINGCollect microarchitecture-independent profiling data
Examples: instruction mix, memory access patterns
SYNAR Systems Networking and Architecture Group
Case Study: Clock Speed Sensitivity
Frequency changes affect different jobs differently.
Clock speed sensitivity is the means to capture these differences.
0
0.25
0.5
0.75
1
1.25
1.5
1.75
3GHz 2.67GHz 2.33GHz 2GHz
core frequency
norm
aliz
ed c
ompl
etio
n tim
e
swimeon
Completion time at different clock speeds
SYNAR Systems Networking and Architecture Group
Offline Profiling
We use MICA, a custom toolkit for Pin by Hoste and Eeckhout [2] (http://trappist.elis.ugent.be/~kehoste/MICA/).MICA gathers a variety of μarch.-independent metrics.For clock speed sensitivity, we want reuse distance data.
SYNAR Systems Networking and Architecture Group
Offline Analysis
Reuse distances are used to estimate abstract L2 cache miss rates.L2 cache miss rates are used to estimate clock speed elasticity, a metric that puts a number on sensitivity.
requires a prediction model for elasticity as function of cache miss rate (see next slide)
Elasticity values are placed into the architectural signature.
SYNAR Systems Networking and Architecture Group
Prediction Model
•The graph shows a mapping of SPEC CPU benchmarks displaying estimated L2 miss rates and clock speed elasticity
•We build a linear model and then use it to predict elasticity during offline analysis
-1.1
-0.9
-0.7
-0.5
-0.3
-0.1
0 5 10 15 20
L2 miss rate, per 1000 inst.
cloc
k fr
eque
ncy
elas
ticity• Constructed once, it can be
used for all future analysis, unless a better model is proposed
Mor
e se
nsiti
veLe
ss s
ensi
tive
SYNAR Systems Networking and Architecture Group
SchedulingRecall: the architectural signature contains elasticity valuesElasticity is straightforward to interpretUsing elasticity, the scheduler categorizes jobs into: highly, moderately and insensitiveFinally, we’re ready to schedule
SYNAR Systems Networking and Architecture Group
Clock Speed Sensitivity Data Flow
MICA reuse distance data
abstract L2 cache miss rates
clock speed elasticity values
clock speed sensitivity category
SYNAR Systems Networking and Architecture Group
Talk OutlineBackgroundMethodologyResultsSummary and Future Work
SYNAR Systems Networking and Architecture Group
Evaluating Clock Speed Sensitive Scheduling
Completion times with our clock speed aware prototype normalized to completion times with the default Linux 2.6.18 scheduler
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
w upw ise mgrid apsi facerec geometricmean
rela
tive
chan
ge in
com
plet
ion
time
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
gcc
gap
eon
fma3
d
mcf
equa
ke
wup
wis
e
luca
s
geom
etric
mea
n
rela
tive
chan
ge in
com
plet
ion
time
0.75
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
eon crafty mcf equake geometricmean
rela
tive
chan
ge in
com
plet
ion
time
Highly heterogeneous workload. Two 2GHz
cores, two 3GHz cores
Balanced workload. One of each of 2GHz, 2.33GHz,
2.67GHz, 3GHz cores
Uniform workload. Two 2GHz cores, two 3GHz
cores.
SYNAR Systems Networking and Architecture Group
Talk OutlineBackgroundMethodologyResultsSummary and Future Work
SYNAR Systems Networking and Architecture Group
Summary
A framework for developing microarchitecture-independent architectural signatures to assist heterogeneity-aware schedulingProof of concept: clock speed aware schedulingResults: tangible benefits even on mildly heterogeneous platforms
up to 4% average throughput increase on a multicore system with 2GHz and 3GHz cores
SYNAR Systems Networking and Architecture Group
Future WorkExtend our framework to include other core characteristics (cache size, issue width,..)Develop and analyze a heterogeneity-aware scheduler in a real operating system (Sun Solaris)Compare that scheduler with other heterogeneity-aware schedulers
SYNAR Systems Networking and Architecture Group
References[1] M. Becchi and P. Crowley. Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures. In Proceedings of the Conference on Computing Frontiers, 2006[2] K. Hoste and L. Eeckhout. Microarchitecture-Independent Workload Characterization. IEEE Micro Hot Tutorials, 27(3):63-72, 2007.[3] R. Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, N. Jouppi, and K. Farkas. Single-ISA Heterogeneous Multicore Architectures for Multithreaded Workload Performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004
SYNAR Systems Networking and Architecture Group
Appendix A: Existing Approaches
Algorithms by Becchi [1] and Kumar [3]These rely on performance monitoring to determine optimal assignment.Potential drawbacks:
don’t scale well to many types of coreslimited applicability to short-lived threads
Scheduler
ü
SYNAR Systems Networking and Architecture Group
Appendix B: Inputs Sets and PerformanceVarying input sets can drastically affect performance
ref vs. test input in SPEC CPU2000
One architectural signature can provide for at most one inputDifficult problem that we are not currently tacklingThere are smart ways to create parameterized approximations that account for data input size:
Y. Zhong, S. G. Dropsho and C. Ding. Miss rate prediction across all program inputs. In Proceedings of Parallel Architechtures and Compilation Techniques, 2003.
SYNAR Systems Networking and Architecture Group
Appendix C: ElasticityWe need two measurements of completion time at two different frequenciesThen we calculate clock speed elasticity of completion time as follows (E = Elasticity, T = Completion time, F = clock speed):
The larger the magnitude, the more sensitive is the completion time to clock speedIn this case, -1.0 is considered very elastic (sensitive), because it means that an increase in frequency by a factor of X will decrease the completion time by the same factor.
21
21
12
12, *
TTFF
FFTTE FT
SYNAR Systems Networking and Architecture Group
Appendix D: Different Cache SizesL2 miss rates (and elasticity) depend heavily on cache size => it has to be taken into accountSolution: calculate miss rates and elasticity for common cache configurations, the scheduler picks appropriateReasonable approach, because cache size aware scheduling takes precedence before clock speed aware scheduling