Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Accurate & Efficient Regression Modeling forMicroarchitectural Performance & Power
Prediction
Benjamin C. Lee, David M. Brooks
{bclee,dbrooks}@eecs.harvard.eduDivision of Engineering and Applied Sciences
Harvard University
24 October 2006
Benjamin C. Lee, David M. Brooks 1 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
OutlineMotivation & Background
Simulation ChallengesSimulation ParadigmRegression Theory
Model DerivationExperimental MethodologyDerivation OverviewModel Specification
Model EvaluationPerformancePower
Conclusion
Benjamin C. Lee, David M. Brooks 2 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
OutlineMotivation & Background
Simulation ChallengesSimulation ParadigmRegression Theory
Model DerivationExperimental MethodologyDerivation OverviewModel Specification
Model EvaluationPerformancePower
Conclusion
Benjamin C. Lee, David M. Brooks 3 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
Microarchitectural Design Space
Increasing diversity of interesting, viable designsExamples :: Power 4, Pentium 4, UltraSPARC T1Tractably quantify trends across comprehensive design space
Benjamin C. Lee, David M. Brooks 4 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
Microarchitectural Simulation Challenges
Cycle-Accurate SimulationAccurately identifies trends in design spaceTracks instructions’ progress through microprocessorEstimates performance, power, temperature, . . .
Simulation CostsLong simulation times (minutes,hours per design)Number of potential simulations scale exponentially (mp)
p :: parameter countm :: parameter resolution
Benjamin C. Lee, David M. Brooks 5 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
Microarchitectural Sampling
Temporal SamplingSample from instruction traces in time domainReduce simulation costs via size of inputsSynthetic traces from profiled workloads 1
Sampled traces from phase analysis 2
Spatial SamplingSample from design spaceReduce simulation costs via number of simulations
1Eeckhout [ISPASS’00]2Sherwood [ASPLOS’02], Wunderlich [ISCA’03]
Benjamin C. Lee, David M. Brooks 6 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
Simulation Paradigm
Comprehensively understand design spaceSpecify large, high-resolution design spaceConsider all design parameter simultaneously
Selectively simulate modest number of designsSample points randomly from design space for simulationDecouple resolution of design space and simulation
Efficiently leverage simulation data with inferenceReveal trends, trade-offs from sparse samplingEnable predictions for metrics of interest
Benjamin C. Lee, David M. Brooks 7 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
Regression Theory
Statistical InferenceModels approximate solutions to intractable problemsRequires initial data to train, formulate modelLeverages correlations from initial data for prediction
Regression ModelsLow formulation costs (1K samples from 1B designs)Accurate inference (4− 7% median error)Efficient computation (100’s of predictions per second)
Benjamin C. Lee, David M. Brooks 8 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
Model Formulation
Notationn observations . {simulated design samples}Response :: ~y = y1, . . . , yn . {e.g., performance, power }Predictor :: ~xi = xi,1, . . . , xi,p . {e.g., depth, cache}
Regression Coefficients :: ~β = β0, . . . , βp
Random Error :: ~e = e1, . . . , en where ei ∼ N(0, σ2)Transformations :: f ,~g = g1, . . . , gp
Model
f (y) = β0 +p∑
j=1
βjgj(xj) + e
Benjamin C. Lee, David M. Brooks 9 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
Predictor Interaction
Modeling InteractionSuppose effects of predictors x1, x2 cannot be separatedConstruct predictor x3 = x1x2
y = β0 + β1x1 + β2x2 + β3x1x2 + ei
ExampleLet x1 be pipeline depth, x2 be L2 cache sizePerformance impact of pipelining affected by cache size
Speedup =Depth
1 + Stalls/Inst
Benjamin C. Lee, David M. Brooks 10 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
Predictor Non-Linearity I
Restricted Cubic SplinesDivide predictor domain into intervals separated by knotsPiecewise cubic polynomials joined at knotsHigher order polynomials provide better fits 3
3Stone [SS’86]Benjamin C. Lee, David M. Brooks 11 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
Predictor Non-Linearity II
Location of KnotsLocation of knots less important than number of knots 4
Place knots at fixed predictor quantiles
Number of KnotsFlexibility, risk of over-fitting increases with knot count5 knots or fewer are often sufficient4 knots balances flexibility, risk of over-fitting
4Harrell [Springer’01]Benjamin C. Lee, David M. Brooks 12 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmRegression Theory
Prediction
Expected Responseβ are known from least squaresxi,1, . . . , xi,p are known for a given query iExpected response is weighted sum of predictor values
E[y]
= E[β0 +
p∑j=1
βjxj
]+ E
[e]
= β0 +p∑
j=1
βjxj
Benjamin C. Lee, David M. Brooks 13 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyDerivation OverviewModel Specification
OutlineMotivation & Background
Simulation ChallengesSimulation ParadigmRegression Theory
Model DerivationExperimental MethodologyDerivation OverviewModel Specification
Model EvaluationPerformancePower
Conclusion
Benjamin C. Lee, David M. Brooks 14 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyDerivation OverviewModel Specification
Tools and Benchmarks
Simulation FrameworkTurandot :: a cycle-accurate trace driven simulatorPowerTimer :: power models derived from circuit analysesBaseline simulator models POWER4/POWER5 architecture
BenchmarksSPEC2kCPU :: compute-intensive benchmarksSPECjbb :: Java server benchmark
Statistical FrameworkR :: software environment for statistical computingHmisc and Design packages5
5Harrell [Springer,’01]Benjamin C. Lee, David M. Brooks 15 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyDerivation OverviewModel Specification
Spatial Sampling
Design SpaceSi :: set of values for parameter xi, i∈[1, p]S =
∏pi=1 Si :: design space
B :: set of benchmarks|S| ≈ 109 and |B| = 22
Sampling Uniformly at Random (UAR)Sample n = 4, 000 designs and benchmarks for simulationDecouple resolution of design space and simulationUnbiased observations from full range of parameter values
Benjamin C. Lee, David M. Brooks 16 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyDerivation OverviewModel Specification
Predictors :: MicroarchitectureSet Parameters Measure Range |Si|
S1 Depth depth FO4 9::3::36 10S2 Width width insn b/w 4,8,16 3
L/S reorder queue entries 15::15::45store queue entries 14::14::42functional units count 1,2,4
S3 Physical general purpose (GP) count 40::10::130 10Registers floating-point (FP) count 40::8::112
special purpose (SP) count 42::6::96S4 Reservation branch entries 6::1::15 10
Stations fixed-point/memory entries 10::2::28floating-point entries 5::1::14
S5 I-L1 Cache i-L1 cache size log2(entries) 7::1::11 5S6 D-L1 Cache d-L1 cache size log2(entries) 6::1::10 5S7 L2 Cache L2 cache size log2(entries) 11::1::15 5
L2 cache latency cycles 6::2::14S8 Control Latency branch latency cycles 1,2 2S9 FX Latency ALU latency cycles 1::1::5 5
FX-multiply latency cycles 4::1::8FX-divide latency cycles 35::5::55
S10 FP Latency FPU latency cycles 5::1::9 5FP-divide latency cycles 25::5::45
S11 L/S Latency Load/Store latency cycles 3::1::7 5S12 Memory Latency Main memory latency cycles 70::5::115 10
Benjamin C. Lee, David M. Brooks 17 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyDerivation OverviewModel Specification
Predictors :: Application-Specific
Application CharacteristicsCollect program characteristics on baseline architectureInstruction throughputCache access patternsBranch patternsSources of pipeline stalls
Application EffectsSignificant interactions with microarchitectural predictorsExample :: Impact of d-L1 cache affected by access rates
Benjamin C. Lee, David M. Brooks 18 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyDerivation OverviewModel Specification
Derivation Overview
Hierarchical ClusteringPerformance Associations and Correlations
qualitative scatterplots, quantitative ρ2
Model Specificationpredictor interaction, non-linearity
Assessing FitR2 statistic
Residual Analysisnormality (quantile-quantile), randomness (scatterplots)
Significance Testinghypothesis testing, F-statistic, p-values
Benjamin C. Lee, David M. Brooks 19 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyDerivation OverviewModel Specification
Derivation Overview
Hierarchical ClusteringPerformance Associations and Correlations
qualitative scatterplots, quantitative ρ2
Model Specificationpredictor interaction, non-linearity
Assessing FitR2 statistic
Residual Analysisnormality (quantile-quantile), randomness (scatterplots)
Significance Testinghypothesis testing, F-statistic, p-values
Benjamin C. Lee, David M. Brooks 20 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyDerivation OverviewModel Specification
Performance Correlations
Benjamin C. Lee, David M. Brooks 21 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyDerivation OverviewModel Specification
Model Specification
InteractionsPipeline width/depth interact with
instruction bandwidth structures (queues, register file)cache hierarchy
Cache hierarchy sizes interact withadjacent levels in hierarchyapplication-specific access rates
Baseline performance interacts with resource sizings
Restricted Cubic SplinesWeaker relationships (latencies, caches, queues) :: 3 knotsStronger relationships (depth, registers) :: 4 knotsBaseline performance :: 5 knots
Benjamin C. Lee, David M. Brooks 22 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
PerformancePower
OutlineMotivation & Background
Simulation ChallengesSimulation ParadigmRegression Theory
Model DerivationExperimental MethodologyDerivation OverviewModel Specification
Model EvaluationPerformancePower
Conclusion
Benjamin C. Lee, David M. Brooks 23 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
PerformancePower
Validation Approach
FrameworkFormulate models with n < 4, 000 samplesObtain 100 additional random samples for validationQuantify percentage error, 100 ∗ |yi − yi|/yi
Model VariantsBaseline (B) :: Non-transformed responseStabilized (S) :: Square-root of responseRegional (S+R) :: Per query with similar samplesApplication (S+A) :: Per benchmark with similar samples
Benjamin C. Lee, David M. Brooks 24 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
PerformancePower
Regional Sampling
Benjamin C. Lee, David M. Brooks 25 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
PerformancePower
Performance Prediction
Benjamin C. Lee, David M. Brooks 26 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
PerformancePower
Performance Sensitivity :: S+A
Benjamin C. Lee, David M. Brooks 27 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
PerformancePower
Power Prediction
Benjamin C. Lee, David M. Brooks 28 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
PerformancePower
Power Sensitivity :: S+R Region
Benjamin C. Lee, David M. Brooks 29 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
OutlineMotivation & Background
Simulation ChallengesSimulation ParadigmRegression Theory
Model DerivationExperimental MethodologyDerivation OverviewModel Specification
Model EvaluationPerformancePower
Conclusion
Benjamin C. Lee, David M. Brooks 30 :: ASPLOS-XII
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Conclusion
Simulation ParadigmComprehensively understand design spaceSelectively simulate modest number of designsEfficiently leverage simulation data with inference
Model Evaluation7.4%, 4.3% median errors for performance, powerS+A, S+R more accurate for performance, power
Future DirectionsDemonstrate for comprehensive design studies 6
Expand design space and benchmark suiteExtend to CMP’s and interconnect modeling
6Lee [HPCA’07] :: www.deas.harvard.edu/∼bcleeBenjamin C. Lee, David M. Brooks 31 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Appendixwww.deas.harvard.edu/∼bclee
B.C. Lee and D.M. Brooks.Illustrative design space studies with microarchitectural regression modelsHPCA-13: International Symposium on High Performance Computer Architecture, Feb 2007.
B.C. Lee and D.M. Brooks.Accurate, efficient regression modeling for microarchitectural performance, power prediction.ASPLOS-XII: International Conference on Architectural Support for Programming Languagesand Operating Systems, Oct 2006.
B.C. Lee and D.M. Brooks.Statistically rigorous regression modeling for the microprocessor design space.MoBS-2: Workshop on Modeling, Benchmarking, and Simulation, June 2006.
B.C. Lee and D.M. Brooks.Regression modeling strategies for microarchitectural performance and power prediction.Harvard University Technical Report TR-08-06, March 2006.
Benjamin C. Lee, David M. Brooks 32 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
References IY. Li, B.C. Lee, D. Brooks, Z. Hu, K. Skadron.CMP design space exploration subject to physical constraints.HPCA-12: International Symposium on High-Performance Computer Architecture, Feb 2006.
L. Eeckhout, S. Nussbaum, J. Smith, and K. DeBosschere.Statistical simulation: Adding efficiency to the computer designer’s toolbox.IEEE Micro, Sept/Oct 2003.
R. Liu and K. Asanovic.Accelerating architectural exploration using canonical instruction segments.In International Symposium on Performance Analysis of Systems and Software, Austin, Texas,March 2006.
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder.Automatically characterizing large scale program behavior.ASPLOX-X: Architectural Support for Programming Languages and Operating Systems,October 2002.
B.C. Lee and D.M. Brooks.Effects of pipeline complexity on SMT/CMP power-performance efficiency.ISCA-32: Workshop on Complexity Effective Design, June 2005.
Benjamin C. Lee, David M. Brooks 33 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
References IIC. Stone.Comment: Generalized additive models.Statistical Science, 1986.
F. Harrell.Regression modeling strategies.Springer, New York, NY, 2001.
J. Yi, D. Lilja, and D. Hawkins.Improving computer architecture simulation methodology by adding statistical rigor.IEEE Computer, Nov 2005.
P. Joseph, K. Vaswani, and M. J. Thazhuthaveetil.Construction and use of linear regression models for processor performance analysis.In Proceedings of the 12th Symposium on High Performance Computer Architecture, Austin,Texas, February 2006.
S. Nussbaum and J. Smith.Modeling superscalar processors via statistical simulation.In PACT2001: International Conference on Parallel Architectures and Compilation Techniques,Barcelona, Sept 2001.
Benjamin C. Lee, David M. Brooks 34 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Controlling Simulation Costs
Hybrid SimulationDecouples simulation of microprocessor structuresLeverages fast, specialized simulators for particular units 7
Trace Sampling/CompressionReduces redundant simulationSimulate unique, representative instruction segments 8
Synthetic WorkloadsReduces size of simulator inputsProfiles workload to construct smaller, synthetic traces 9
7Li, Lee, Brooks, Hu, Skadron [HPCA’06]8Liu, Asanovic [ISPASS’06], Sherwood, et al., [ASPLOS’02]9Eeckhout, Nussbaum, Smith, DeBosschre [IEEE Micro’03]
Benjamin C. Lee, David M. Brooks 35 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Variable Clustering
stal
l_ca
stst
all_
resv
dl1m
iss_
rate
base
_bip
sdl
2mis
s_ra
test
all_
dmis
sqst
all_
stor
eqbr
_mis
_rat
est
all_
reor
derq
fix_l
atct
l_la
tls
_lat
fpu_
lat
dept
hm
em_l
atw
idth
bips
phys
_reg
l2ca
che_
size
resv
il1m
iss_
rate
il2m
iss_
rate stal
l_in
fligh
tst
all_
rena
me
br_r
ate
br_s
tall
icac
he_s
ize
dcac
he_s
ize
1.0
0.8
0.6
0.4
0.2
0.0
Spe
arm
an ρ
2
Benjamin C. Lee, David M. Brooks 36 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Performance Associations
bips
0.6 0.7 0.8 0.9
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
12131170 815 802
131813201362
11971179 802 822
11251247 818 810
4000
N
4 8
16
[ 9,18) [18,27) [27,33) [33,36]
[ 40, 70) [ 70,100) [100,120) [120,130]
[ 6, 9) [ 9,12)
[12,14) [14,15]
depth
width
phys_reg
resv
Overall
Pipeline
N=4000bips
0.74 0.78 0.82
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
799 828 807 805 761
774 794 773 837 822
825 776 823 780 796
4000
N
11 12 13 14 15
7 8 9
10 11
6 7 8 9
10
l2cache_size
icache_size
dcache_size
Overall
Memory Hierarchy
N=4000bips
0.5 0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
1110 9191071 900
1103 9111103 883
1856 349 928 867
4000
N
0.02
[5.08e−06,3.91e−05) [3.91e−05,7.15e−04) [7.15e−04,3.85e−03) [3.85e−03,3.71e−02]
[0.02,0.09) [0.09,0.12) [0.12,0.18) [0.18,0.55]
[0.00,0.02)
[0.04,0.11) [0.11,0.24]
il1miss_rate
dl1miss_rate
dl2miss_rate
Overall
Cache
N=4000
Benjamin C. Lee, David M. Brooks 37 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Assessing FitMultiple Correlation Statistic
R2 is fraction of response variance captured by predictorsLarge R2 suggests better fit to observed dataR2 → 1 suggests over-fitting (less likely if p < n/20)
R2 = 1−∑n
i=1(yi − yi)2∑ni=1(yi − 1
n
∑ni=1 yi)2
Residual Distribution AssumptionsResiduals are normally distributed, ei ∼ N(0, σ2)No correlation between residuals and response, predictorsValidate by scatterplots and quantile-quantile plots
ei = yi − β0 −p∑
j=0
βjxij
Benjamin C. Lee, David M. Brooks 38 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Predictor Non-Linearity I
Polynomial TransformationsUndesirable peaks and valleysDiffering trends across regions
Linear SplinesPiecewise linear regions separated by knotsInadequate for complex, highly curved relationships
Restricted Cubic SplinesHigher order polynomials provide better fitsContinuous at knotsLinear constraint on tails
Benjamin C. Lee, David M. Brooks 39 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Predictor Non-Linearity II
Location of KnotsLocation of knots less important than number of knotsPlace knots at fixed predictor quantiles
Number of KnotsFlexibility, risk of over-fitting increases with knot count5 knots or fewer are often sufficient 10
4 knots is a good compromise between flexibility, over-fittingFewer knots required for small data sets
10Stone [SS’86]Benjamin C. Lee, David M. Brooks 40 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Significance Testing I
ApproachGiven two nested models, hypothesis H0 states additionalpredictors in larger model have no response associationTest H0 with F-statistics and p-values
ExamplePredictor interaction requires comparing nested modelsConsider a model y = β0 + β1x1 + β2x2 + β3x1x2.Test significance of x1 with null hypothesis H0 : β1 = β3 = 0
Benjamin C. Lee, David M. Brooks 41 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Significance Testing IIF-Statistic
Compare two nested models using their R2 and F-statisticR2 is fraction of response variance captured by predictors
R2 = 1−∑n
i=1(yi − yi)2∑ni=1(yi − 1
n
∑ni=1 yi)2
F-statistic of two nested models follows F distribution
Fk,n−p−1 =R2 − R2
∗k
× n− p− 11− R2
P-ValuesProbability F-statistic greater than or equal to observedvalue would occur under H0
Small p-values cast doubt on H0
Benjamin C. Lee, David M. Brooks 42 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Treatment of Missing Data
Missing Completely at Random (MCAR)Treat unobserved design points as missing dataSampling UAR ensures observations are MCARData is missing for reasons unrelated to characteristics orresponses of the configuration
Informative MissingData is more likely missing if their responses aresystematically higher or lower“Missingness” is non-ignorable and must also be modeledSampling UAR avoids such modeling complications
Benjamin C. Lee, David M. Brooks 43 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Performance Associations I
bips
0.6 0.7 0.8 0.9
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
12131170 815 802
131813201362
11971179 802 822
11251247 818 810
4000
N
4 8
16
[ 9,18) [18,27) [27,33) [33,36]
[ 40, 70) [ 70,100) [100,120) [120,130]
[ 6, 9) [ 9,12)
[12,14) [14,15]
depth
width
phys_reg
resv
Overall
Pipeline
N=4000bips
0.70 0.80 0.90
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10101017 977 996
1505 728 953 814
2399 800 432 180 189
1036 973
1197 794
1298 707
1046 949
4000
N
4
1 2 3 4 5
1 2
5
[ 34, 56) [ 56, 77)
[ 77,124) [124,307]
[1, 4)
[5, 8) [8,19]
[3, 5) [5,13]
[ 2, 5)
[ 6,10) [10,24]
mem_lat
ls_lat
ctl_lat
fix_lat
fpu_lat
Overall
Latency
N=4000bips
0.74 0.78 0.82
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
799 828 807 805 761
774 794 773 837 822
825 776 823 780 796
4000
N
11 12 13 14 15
7 8 9
10 11
6 7 8 9
10
l2cache_size
icache_size
dcache_size
Overall
Memory Hierarchy
N=4000
Benjamin C. Lee, David M. Brooks 44 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Performance Associations II
bips
0.6 0.8
●
●
●
●
●
●
●
●
●
1076
1060
934
930
1085
1055
960
900
4000
N
[0.01,0.05)
[0.05,0.10)
[0.10,0.16)
[0.16,0.29]
[0.005,0.034)
[0.034,0.072)
[0.072,0.093)
[0.093,0.165]
br_rate
br_mis_rate
Overall
Branch
N=4000bips
0.6 0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
10751104 910 911
1114 933
1053 900
1135 876
1129 860
4000
N
[ 6225, 108733) [108733, 242619) [242619, 597291) [597291,2821539]
[ 193084,2.60e+06) [ 2599382,7.22e+06) [ 7222608,4.69e+07) [46879987,1.04e+08]
[ 949, 6037) [ 6037, 18667)
[ 18667, 254857) [254857,14883942]
stall_inflight
stall_dmissq
stall_cast
Overall
Stalls
N=4000bips
0.5 0.7 0.9
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1093 933
1045 929
1096 907
1077 920
1127 922
1070 881
10991075 941 885
4000
N
[ 0, 9959) [ 9959, 97381)
[ 97381, 299112) [299112,1373099]
[ 0, 350) [ 350, 24448)
[ 24448,144305) [144305,923543]
[ 1814764,5.98e+06) [ 5984258,1.01e+07)
[10134921,2.42e+07) [24224471,1.13e+08]
[ 16188, 2013914) [ 2013914, 4131316)
[ 4131316,22315283) [22315283,45296951]
stall_storeq
stall_reorderq
stall_resv
stall_rename
Overall
Stalls
N=4000
Benjamin C. Lee, David M. Brooks 45 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Performance Associations III
bips
0.5 0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
1110 9191071 900
1103 9111103 883
1856 349 928 867
4000
N
0.02
[5.08e−06,3.91e−05) [3.91e−05,7.15e−04) [7.15e−04,3.85e−03) [3.85e−03,3.71e−02]
[0.02,0.09) [0.09,0.12) [0.12,0.18) [0.18,0.55]
[0.00,0.02)
[0.04,0.11) [0.11,0.24]
il1miss_rate
dl1miss_rate
dl2miss_rate
Overall
Cache
N=4000bips
0.5 0.7 0.9
●
●
●
●
●
1056
1122
913
909
4000
N
[0.309,0.92)
[0.920,1.21)
[1.208,1.35)
[1.347,1.58]
base_bips
Overall
Baseline Performance
N=4000
Benjamin C. Lee, David M. Brooks 46 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Significance Tests
Microarchitectural PredictorsMajority of F-tests imply significance (p-values < 2.2E − 16)Several predictors were less significant
Control latency (p-value = 0.1247)Reservation station size (p-value = 0.1239)L1 instruction cache size (p-value = 0.02941)
Application-Specific PredictorsMajority of F-tests imply significance (p-values < 2.2E − 16)Pipeline stalls classified by structure are less significant
Completion and reorder queue stalls (p-values > 0.4)
Benjamin C. Lee, David M. Brooks 47 :: ASPLOS-XII
AppendixAppendixReferencesExtra Slides
Related Work
Statistical Significance RankingYi :: Plackett-Burman, effect rankingsJoseph :: Stepwise regression, coefficient rankingsBound parameter values to improve tractabilityRequire simulation for estimation
Synthetic WorkloadsEeckhout :: Profile workloads to obtain synthetic tracesNussbaum :: Superscalar and SMP simulationObtain distribution of instructions and data dependenciesRequire simulation with smaller traces for estimation
Benjamin C. Lee, David M. Brooks 48 :: ASPLOS-XII