Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Statistically Rigorous Regression Modelingfor the Microprocessor Design Space
Benjamin C. Lee1,2, David M. Brooks1
[email protected] of Engineering and Applied Sciences
Harvard University
2Center for Applied Scientific Computing ResearchLawrence Livermore National Laboratory
Benjamin C. Lee, David M. Brooks :: 18 June 2006 1 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
OutlineMotivation & Background
Simulation ChallengesSimulation ParadigmsRegression Theory
Model DerivationExperimental MethodologyCorrelation AnalysisModel Specification
Model EvaluationValidation ApproachPerformancePower
ConclusionSummaryFuture Directions
Benjamin C. Lee, David M. Brooks :: 18 June 2006 2 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmsRegression Theory
OutlineMotivation & Background
Simulation ChallengesSimulation ParadigmsRegression Theory
Model DerivationExperimental MethodologyCorrelation AnalysisModel Specification
Model EvaluationValidation ApproachPerformancePower
ConclusionSummaryFuture Directions
Benjamin C. Lee, David M. Brooks :: 18 June 2006 3 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmsRegression Theory
Microarchitectural Design Space
I Trend toward chip multiprocessors (CMP’s) with varying core designsI Power 4, Pentium 4, UltraSPARC T1I Tractably quantify trade-offs between core complexity, count
Benjamin C. Lee, David M. Brooks :: 18 June 2006 4 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmsRegression Theory
Design Space Exploration
I Limitations of Existing Simulation MethodologyI Trace sampling, compression reduce per simulation costsI Existing techniques do not reduce number of simulationsI Space size increases exponentially with parameter countI Multi-threaded, multi-core simulations further constrained
I Prior Design Space AnalysesI Consider mp design pointsI Vary one or two parameters at fine granularityI Vary multiple parameters at coarse granularityI Hold majority of parameters at constant values
Benjamin C. Lee, David M. Brooks :: 18 June 2006 5 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmsRegression Theory
Simulation Paradigms
I ObjectivesI Comprehensively understand microprocessor design spaceI Selectively perform a modest number of simulationsI Efficiently leverage simulation data
I Random Configuration SamplingI Sample points UAR from design space for simulationI Controls exponential increase in design count
I Statistical InferenceI Reveals trends, trade-offs from sparse samplingI Enables prediction for metrics of interest
Benjamin C. Lee, David M. Brooks :: 18 June 2006 6 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmsRegression Theory
Statistical Inference
I ApproachI Models approximate solutions to intractable problemsI Requires initial data to train, formulate modelI Leverages correlations from initial data for prediction
I Regression ModelingI Efficient formulation :: sample 1K of ≈1B, least squaresI Accurate inference :: 4− 7% median errorI Static accuracy :: no predictive training
Benjamin C. Lee, David M. Brooks :: 18 June 2006 7 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmsRegression Theory
Model FormulationI Notation
I n observationsI Response :: y = y1, . . . , yn
I Predictor :: xi = xi,1, . . . , xi,p
I Regression Coefficients :: β = β0, . . . , βp
I Random Error :: e = e1, . . . , en where ei ∼ N(0, σ2)I Transformations :: f , g = g1, . . . , gp
I Model
f (yi) = βg(xi) + ei
= β0 +p∑
j=1
βjgj(xij) + ei
Benjamin C. Lee, David M. Brooks :: 18 June 2006 8 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmsRegression Theory
Predictor Interaction
I Modeling InteractionI Suppose effects of predictors x1, x2 cannot be separatedI Construct predictor x3 = x1x2
y = β0 + β1x1 + β2x2 + β3x1x2 + ei
I ExampleI Let x1 be pipeline depth, x2 be L2 cache sizeI Performance impact of pipelining affected by cache size
Speedup =Depth
1 + Stalls/Inst
Benjamin C. Lee, David M. Brooks :: 18 June 2006 9 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmsRegression Theory
Predictor Non-Linearity
I Restricted Cubic SplinesI Divide predictor domain into intervals separated by knotsI Piecewise cubic polynomials joined at knots 1
I Higher order polynomials provide better fits
I Location of KnotsI Location of knots less important than number of knotsI Place knots at fixed predictor quantiles
I Number of KnotsI Flexibility, risk of over-fitting increases with knot countI 5 knots or fewer are often sufficientI 4 knots balances flexibility, over-fitting
1Stone [SS’86]Benjamin C. Lee, David M. Brooks :: 18 June 2006 10 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Simulation ChallengesSimulation ParadigmsRegression Theory
Prediction
I Expected ResponseI Suppose coefficients β, predictors’ xi,1, . . . , xi,p are knownI Expected response is weighted sum of predictor values
E[yi
]= E
[β0 +
p∑j=1
βjxij
]+ E
[ei
]= β0 +
p∑j=1
βjxij
Benjamin C. Lee, David M. Brooks :: 18 June 2006 11 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyCorrelation AnalysisModel Specification
OutlineMotivation & Background
Simulation ChallengesSimulation ParadigmsRegression Theory
Model DerivationExperimental MethodologyCorrelation AnalysisModel Specification
Model EvaluationValidation ApproachPerformancePower
ConclusionSummaryFuture Directions
Benjamin C. Lee, David M. Brooks :: 18 June 2006 12 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyCorrelation AnalysisModel Specification
Tools and Benchmarks
I Simulation FrameworkI Turandot :: a cycle-accurate trace driven simulatorI PowerTimer :: power models derived from circuit analysesI Baseline simulator models POWER4/POWER5 architecture
I BenchmarksI SPEC2kCPU :: compute-intensive benchmarksI SPECjbb :: Java server benchmark
I Statistical FrameworkI R :: software environment for statistical computingI Hmisc and Design packages2
2Harrell [Springer,’01]Benjamin C. Lee, David M. Brooks :: 18 June 2006 13 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyCorrelation AnalysisModel Specification
Configuration Sampling
I Design Space SizeI For i∈[1, p], Si defines possible values for parameter xi
I S =∏p
i=1 Si defines design spaceI |S| =
∏pi=1 |Si| defines space size
I B defines set of benchmarks, |B| × |S| potential simulationsI |S| ≈ 109 and |B| = 22
I Sampling Uniformly at Random (UAR)I Sample n = 4, 000 design points and benchmarksI Unbiased observations from full range of parameter valuesI Trends, trade-offs between parameters at fine granularity
Benjamin C. Lee, David M. Brooks :: 18 June 2006 14 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyCorrelation AnalysisModel Specification
Predictors :: MicroarchitectureSet Parameters Measure Range |Si|
S1 Depth depth FO4 9::3::36 10S2 Width width insn b/w 4,8,16 3
L/S reorder queue entries 15::15::45store queue entries 14::14::42functional units count 1,2,4
S3 Physical general purpose (GP) count 40::10::130 10Registers floating-point (FP) count 40::8::112
special purpose (SP) count 42::6::96S4 Reservation branch entries 6::1::15 10
Stations fixed-point/memory entries 10::2::28floating-point entries 5::1::14
S5 I-L1 Cache i-L1 cache size log2(entries) 7::1::11 5S6 D-L1 Cache d-L1 sache size log2(entries) 6::1::10 5S7 L2 Cache L2 cache size log2(entries) 11::1::15 5
L2 cache latency cycles 6::2::14S8 Control Latency branch latency cycles 1,2 2S9 FX Latency ALU latency cycles 1::1::5 5
FX-multiply latency cycles 4::1::8FX-divide latency cycles 35::5::55
S10 FP Latency FPU latency cycles 5::1::9 5FP-divide latency cycles 25::5::45
S11 L/S Latency Load/Store latency cycles 3::1::7 5S12 Memory Latency Main memory latency cycles 70::5::115 10
Benjamin C. Lee, David M. Brooks :: 18 June 2006 15 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyCorrelation AnalysisModel Specification
Predictors :: Application-Specific
I Application CharacteristicsI Collect program characteristics on baseline architectureI Baseline instruction throughput (BIPS)I Cache access patterns (i-L1, d-L1, L2 miss rates)I Branch patterns (branch frequency, mispredict rate)I Sources of pipeline stalls (per queue stall histograms)
I Application EffectsI Characteristics are significant predictors when interacting
with microarchitectural predictorsI Example :: Impact of d-L1 cache affected by access rates
Benjamin C. Lee, David M. Brooks :: 18 June 2006 16 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyCorrelation AnalysisModel Specification
Variable Clustering
stal
l_ca
stst
all_
resv
dl1m
iss_
rate
base
_bip
sdl
2mis
s_ra
test
all_
dmis
sqst
all_
stor
eqbr
_mis
_rat
est
all_
reor
derq
fix_l
atct
l_la
tls
_lat
fpu_
lat
dept
hm
em_l
atw
idth
bips
phys
_reg
l2ca
che_
size
resv
il1m
iss_
rate
il2m
iss_
rate stal
l_in
fligh
tst
all_
rena
me
br_r
ate
br_s
tall
icac
he_s
ize
dcac
he_s
ize
1.0
0.8
0.6
0.4
0.2
0.0
Spe
arm
an ρ
2
Benjamin C. Lee, David M. Brooks :: 18 June 2006 17 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyCorrelation AnalysisModel Specification
Strength of Marginal Relationships
Benjamin C. Lee, David M. Brooks :: 18 June 2006 18 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Experimental MethodologyCorrelation AnalysisModel Specification
Regression Model Specification
I InteractionsI Pipeline width/depth interact with
I instruction bandwidth structures (queues, register file)I cache hierarchy
I Cache hierarchy sizes interact withI adjacent levels in hierarchyI application-specific access rates
I Baseline performance interacts with resource sizings
I Restricted Cubic SplinesI Weaker relationships (latencies, caches, queues) :: 3 knotsI Stronger relationships (depth, registers) :: 4 knotsI Baseline application performance :: 5 knots
Benjamin C. Lee, David M. Brooks :: 18 June 2006 19 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Validation ApproachPerformancePower
OutlineMotivation & Background
Simulation ChallengesSimulation ParadigmsRegression Theory
Model DerivationExperimental MethodologyCorrelation AnalysisModel Specification
Model EvaluationValidation ApproachPerformancePower
ConclusionSummaryFuture Directions
Benjamin C. Lee, David M. Brooks :: 18 June 2006 20 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Validation ApproachPerformancePower
Validation ApproachI Framework
I Formulate models with n∗ < n = 4, 000 samplesI Obtain 100 additional random samples for validationI Quantify percentage error, 100 ∗ |yi − yi|/yi
I Model VariantsI Baseline (B): Model non-transformed responseI Variance Stabilized (S): Model square-root of responseI Regional (S+R): For each query, reformulate model with
samples most similarly configured to query
d =
[ p∑i=1
(ai − bi
ai
)2]1/2
I Application-Specific (S+A): Fix sample benchmarks
Benjamin C. Lee, David M. Brooks :: 18 June 2006 21 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Validation ApproachPerformancePower
Performance Prediction
Benjamin C. Lee, David M. Brooks :: 18 June 2006 22 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Validation ApproachPerformancePower
Power Prediction
Benjamin C. Lee, David M. Brooks :: 18 June 2006 23 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
Validation ApproachPerformancePower
Performance-Power Comparison
I Performance AccuracyI 7.4% median error for S+A modelI S+A reduces performance variance across applicationsI S+R ineffective since application is primary determinant of
performance
I Power AccuracyI 4.3% median error for S+R modelI S+R reduces power variance across configurationsI S+A ineffective since resource sizings are primary
determinants of power
Benjamin C. Lee, David M. Brooks :: 18 June 2006 24 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
SummaryFuture Directions
OutlineMotivation & Background
Simulation ChallengesSimulation ParadigmsRegression Theory
Model DerivationExperimental MethodologyCorrelation AnalysisModel Specification
Model EvaluationValidation ApproachPerformancePower
ConclusionSummaryFuture Directions
Benjamin C. Lee, David M. Brooks :: 18 June 2006 25 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
SummaryFuture Directions
Summary
I Simulation ChallengesI Limited design space studies due to simulation costsI Existing frameworks reduce per simulation costs only
I Regression ModelsI Sampling :: 1K of ≈1B configurations UARI Specification :: correlation analysesI Refinement :: stabilizing transformations
I Model EvaluationI 7.4%, 4.3% median errors for performance, powerI S+A, S+R more effective for performance, power
Benjamin C. Lee, David M. Brooks :: 18 June 2006 26 :: Workshop on Modeling, Benchmarking, and Simulation
Motivation & BackgroundModel DerivationModel Evaluation
Conclusion
SummaryFuture Directions
Future Directions
I Model ApplicationsI Demonstrate applicability to prior studiesI Models enable more aggressive studiesI Construct a CMP simulation framework
I Model ImprovementsI Techniques, transformations to further reduce error, bias
I Survey Approaches in Statistical InferenceI Compare regression modeling with machine learning
Benjamin C. Lee, David M. Brooks :: 18 June 2006 27 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Publicationswww.deas.harvard.edu/∼bclee
B.C. Lee and D.M. Brooks.Statistically rigorous regression modeling for the microprocessordesign space.ISCA-33: Workshop on Modeling, Benchmarking, and Simulation,June 2006.
B.C. Lee and D.M. Brooks.Accurate, efficient regression modeling for microarchitecturalperformance, power prediction.ASPLOS-XII: International Conference on Architectural Support forProgramming Languages and Operating Systems, Oct 2006. (ToAppear)
Benjamin C. Lee, David M. Brooks :: 18 June 2006 28 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
References IY. Li, B.C. Lee, D. Brooks, Z. Hu, K. Skadron.CMP design space exploration subject to physical constraints.HPCA-12: International Symposium on High-Performance Computer Architecture, Feb 2006.
L. Eeckhout, S. Nussbaum, J. Smith, and K. DeBosschere.Statistical simulation: Adding efficiency to the computer designer’s toolbox.IEEE Micro, Sept/Oct 2003.
R. Liu and K. Asanovic.Accelerating architectural exploration using canonical instruction segments.In International Symposium on Performance Analysis of Systems and Software, Austin, Texas,March 2006.
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder.Automatically characterizing large scale program behavior.ASPLOX-X: Architectural Support for Programming Languages and Operating Systems,October 2002.
B.C. Lee and D.M. Brooks.Effects of pipeline complexity on SMT/CMP power-performance efficiency.ISCA-32: Workshop on Complexity Effective Design, June 2005.
Benjamin C. Lee, David M. Brooks :: 18 June 2006 29 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
References IIC. Stone.Comment: Generalized additive models.Statistical Science, 1986.
F. Harrell.Regression modeling strategies.Springer, New York, NY, 2001.
J. Yi, D. Lilja, and D. Hawkins.Improving computer architecture simulation methodology by adding statistical rigor.IEEE Computer, Nov 2005.
P. Joseph, K. Vaswani, and M. J. Thazhuthaveetil.Construction and use of linear regression models for processor performance analysis.In Proceedings of the 12th Symposium on High Performance Computer Architecture, Austin,Texas, February 2006.
S. Nussbaum and J. Smith.Modeling superscalar processors via statistical simulation.In PACT2001: International Conference on Parallel Architectures and Compilation Techniques,Barcelona, Sept 2001.
Benjamin C. Lee, David M. Brooks :: 18 June 2006 30 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Assessing FitI Multiple Correlation Statistic
I R2 is fraction of response variance captured by predictorsI Large R2 suggests better fit to observed dataI R2 → 1 suggests over-fitting (less likely if p < n/20)
R2 = 1−∑n
i=1(yi − yi)2∑ni=1(yi − 1
n
∑ni=1 yi)2
I Residual Distribution AssumptionsI Residuals are normally distributed, ei ∼ N(0, σ2)I No correlation between residuals and response, predictorsI Validate by scatterplots and quantile-quantile plots
ei = yi − β0 −p∑
j=0
βjxij
Benjamin C. Lee, David M. Brooks :: 18 June 2006 31 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Predictor Non-Linearity I
I Polynomial TransformationsI Undesirable peaks and valleysI Differing trends across regions
I Linear SplinesI Piecewise linear regions separated by knotsI Inadequate for complex, highly curved relationships
I Restricted Cubic SplinesI Higher order polynomials provide better fitsI Continuous at knotsI Linear constraint on tails
Benjamin C. Lee, David M. Brooks :: 18 June 2006 32 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Predictor Non-Linearity II
I Location of KnotsI Location of knots less important than number of knotsI Place knots at fixed predictor quantiles
I Number of KnotsI Flexibility, risk of over-fitting increases with knot countI 5 knots or fewer are often sufficient 3
I 4 knots is a good compromise between flexibility, over-fittingI Fewer knots required for small data sets
3Stone [SS’86]Benjamin C. Lee, David M. Brooks :: 18 June 2006 33 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Significance Testing I
I ApproachI Given two nested models, hypothesis H0 states additional
predictors in larger model have no response associationI Test H0 with F-statistics and p-values
I ExampleI Predictor interaction requires comparing nested modelsI Consider a model y = β0 + β1x1 + β2x2 + β3x1x2.I Test significance of x1 with null hypothesis H0 : β1 = β3 = 0
Benjamin C. Lee, David M. Brooks :: 18 June 2006 34 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Significance Testing III F-Statistic
I Compare two nested models using their R2 and F-statisticI R2 is fraction of response variance captured by predictors
R2 = 1−∑n
i=1(yi − yi)2∑ni=1(yi − 1
n
∑ni=1 yi)2
I F-statistic of two nested models follows F distribution
Fk,n−p−1 =R2 − R2
∗k
× n− p− 11− R2
I P-ValuesI Probability F-statistic greater than or equal to observed
value would occur under H0
I Small p-values cast doubt on H0
Benjamin C. Lee, David M. Brooks :: 18 June 2006 35 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Treatment of Missing Data
I Missing Completely at Random (MCAR)I Treat unobserved design points as missing dataI Sampling UAR ensures observations are MCARI Data is missing for reasons unrelated to characteristics or
responses of the configuration
I Informative MissingI Data is more likely missing if their responses are
systematically higher or lowerI “Missingness” is non-ignorable and must also be modeledI Sampling UAR avoids such modeling complications
Benjamin C. Lee, David M. Brooks :: 18 June 2006 36 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Performance Associations I
bips
0.6 0.7 0.8 0.9
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
12131170 815 802
131813201362
11971179 802 822
11251247 818 810
4000
N
4 8
16
[ 9,18) [18,27) [27,33) [33,36]
[ 40, 70) [ 70,100) [100,120) [120,130]
[ 6, 9) [ 9,12)
[12,14) [14,15]
depth
width
phys_reg
resv
Overall
Pipeline
N=4000bips
0.70 0.80 0.90
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10101017 977 996
1505 728 953 814
2399 800 432 180 189
1036 973
1197 794
1298 707
1046 949
4000
N
4
1 2 3 4 5
1 2
5
[ 34, 56) [ 56, 77)
[ 77,124) [124,307]
[1, 4)
[5, 8) [8,19]
[3, 5) [5,13]
[ 2, 5)
[ 6,10) [10,24]
mem_lat
ls_lat
ctl_lat
fix_lat
fpu_lat
Overall
Latency
N=4000bips
0.74 0.78 0.82
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
799 828 807 805 761
774 794 773 837 822
825 776 823 780 796
4000
N
11 12 13 14 15
7 8 9
10 11
6 7 8 9
10
l2cache_size
icache_size
dcache_size
Overall
Memory Hierarchy
N=4000
Benjamin C. Lee, David M. Brooks :: 18 June 2006 37 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Performance Associations II
bips
0.6 0.8
●
●
●
●
●
●
●
●
●
1076
1060
934
930
1085
1055
960
900
4000
N
[0.01,0.05)
[0.05,0.10)
[0.10,0.16)
[0.16,0.29]
[0.005,0.034)
[0.034,0.072)
[0.072,0.093)
[0.093,0.165]
br_rate
br_mis_rate
Overall
Branch
N=4000bips
0.6 0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
10751104 910 911
1114 933
1053 900
1135 876
1129 860
4000
N
[ 6225, 108733) [108733, 242619) [242619, 597291) [597291,2821539]
[ 193084,2.60e+06) [ 2599382,7.22e+06) [ 7222608,4.69e+07) [46879987,1.04e+08]
[ 949, 6037) [ 6037, 18667)
[ 18667, 254857) [254857,14883942]
stall_inflight
stall_dmissq
stall_cast
Overall
Stalls
N=4000bips
0.5 0.7 0.9
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1093 933
1045 929
1096 907
1077 920
1127 922
1070 881
10991075 941 885
4000
N
[ 0, 9959) [ 9959, 97381)
[ 97381, 299112) [299112,1373099]
[ 0, 350) [ 350, 24448)
[ 24448,144305) [144305,923543]
[ 1814764,5.98e+06) [ 5984258,1.01e+07)
[10134921,2.42e+07) [24224471,1.13e+08]
[ 16188, 2013914) [ 2013914, 4131316)
[ 4131316,22315283) [22315283,45296951]
stall_storeq
stall_reorderq
stall_resv
stall_rename
Overall
Stalls
N=4000
Benjamin C. Lee, David M. Brooks :: 18 June 2006 38 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Performance Associations III
bips
0.5 0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
1110 9191071 900
1103 9111103 883
1856 349 928 867
4000
N
0.02
[5.08e−06,3.91e−05) [3.91e−05,7.15e−04) [7.15e−04,3.85e−03) [3.85e−03,3.71e−02]
[0.02,0.09) [0.09,0.12) [0.12,0.18) [0.18,0.55]
[0.00,0.02)
[0.04,0.11) [0.11,0.24]
il1miss_rate
dl1miss_rate
dl2miss_rate
Overall
Cache
N=4000bips
0.5 0.7 0.9
●
●
●
●
●
1056
1122
913
909
4000
N
[0.309,0.92)
[0.920,1.21)
[1.208,1.35)
[1.347,1.58]
base_bips
Overall
Baseline Performance
N=4000
Benjamin C. Lee, David M. Brooks :: 18 June 2006 39 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Significance Tests
I Microarchitectural PredictorsI Majority of F-tests imply significance (p-values < 2.2E − 16)I Several predictors were less significant
I Control latency (p-value = 0.1247)I Reservation station size (p-value = 0.1239)I L1 instruction cache size (p-value = 0.02941)
I Application-Specific PredictorsI Majority of F-tests imply significance (p-values < 2.2E − 16)I Pipeline stalls classified by structure are less significant
I Completion and reorder queue stalls (p-values > 0.4)
Benjamin C. Lee, David M. Brooks :: 18 June 2006 40 :: Workshop on Modeling, Benchmarking, and Simulation
AppendixLinksReferencesExtra Slides
Related Work
I Statistical Significance RankingI Yi :: Plackett-Burman, effect rankingsI Joseph :: Stepwise regression, coefficient rankingsI Bound parameter values to improve tractabilityI Require simulation for estimation
I Synthetic WorkloadsI Eeckhout :: Profile workloads to obtain synthetic tracesI Nussbaum :: Superscalar and SMP simulationI Obtain distribution of instructions and data dependenciesI Require simulation with smaller traces for estimation
Benjamin C. Lee, David M. Brooks :: 18 June 2006 41 :: Workshop on Modeling, Benchmarking, and Simulation