Complexity-Effective Issue Queue Design Under Load-Hit Speculation
Tali Moreshet and R. Iris BaharBrown University
Division of Engineering
Brown University WCED 2002
Motivation
Pipelines are getting deeper Higher clock frequencies Increased architectural complexity
Speculatively issued instructions are particularly sensitive to pipeline depth Branch prediction Load hit prediction
Brown University WCED 2002
Pipeline
RegisterFile
FunctionalUnits
RegisterRenameUnit
DataCache
InstructionCache
IssueQueue
Load Resolution Loop
Fetch Decode Issue Execute
forwarding
Brown University WCED 2002
Load Hit Prediction
Issue instructions dependent on load as soon as possible Assume load hits in DL1
BUT… Load hit status is known only after dependent
instructions may issue
Brown University WCED 2002
Example
Exec
ExecIssue
Exec
ExecExec
Cycle: 1 2 3 4 5 6 7 8
LOAD
MULT
SUB
ADD Issue
Issue
Issue
Speculative window
Exec
Brown University WCED 2002
Example
ExecIssue ExecExec
Cycle: 1 2 3 4 5 6 7 8 9
LOAD
ADD
Speculative window
ExecIssue
Issue
Issue
Exec
MULT
SUB
Exec
Brown University WCED 2002
Example
Issue Exec ExecExec
Cycle: 1 2 3 4 5 6 7 8 9 10
LOAD
ADD ExecIssue
Issue
Issue
Speculative window
MULT
SUBExec
Exec
Brown University WCED 2002
What Happens On a Load Miss? Re-issue instructions in speculative window
after a load miss
Keep post-issue instructions in issue queue long enough to ensure re-issuing will not be necessary
Brown University WCED 2002
Complexity-Effective Load Hit Speculation
As pipeline depth increases: Retain performance benefit Consider complexity of re-issue and prediction
policies Consider impact on issue queue design
Brown University WCED 2002
Re-Issue Policies
4 different load hit speculation policies:1) No load hit speculation
2) Perfect load hit speculation
3) Replay only instructions dependent on load that missed
4) Replay all instructions in speculative window Load hit/miss predictor to limit re-issuing
Brown University WCED 2002
Performance Impact
-5%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Exe1 Exe3 Exe5 Exe7
Pe
rfo
rma
nce
Incr
ea
se fr
om
No
Lo
ad
Sp
ecu
latio
n Perfect_Int
Dep_Int
Dep_Pred_Int
Seq_Int
Seq_Pred_Int
Perfect_FP
Dep_FP
Dep_Pred_FP
Seq_FP
Seq_Pred_FP
Brown University WCED 2002
Impact on Issue Queue Occupancy
0
5
10
15
20
25
30
35
40
No LoadSpeculation,
IntegerBenchmarks
No LoadSpeculation,
Floating PointBenchmarks
DependentLoad
Speculation,Integer
Benchmarks
DependentLoad
Speculation,Floating PointBenchmarks
Ave
rage
Num
ber
of In
stru
ctio
ns in
the
Issu
e Q
ueue
pre-issue post-issue
Brown University WCED 2002
Impact on Issue Queue Occupancy
0%
10%
20%
30%
40%
50%
60%
70%
Exe1 Exe3 Exe5 Exe7
Pe
rce
nta
ge
of P
ost
-Iss
ue
Inst
ruct
ion
s in
the
Issu
e
Qu
eu
e
compress
ijpeg
bzip
Int_avg
apsi
swim
art
wupwise
FP_avg
Brown University WCED 2002
Impact on Issue Queue Occupancy As pipeline depth increases:
Issue queue gets cluttered with post-issue instructions (average 55%)
Limits the available ILP Inefficient use of complexity in instruction
bid/grant arbitration logic
Brown University WCED 2002
The Bid / Grant Loop
Prioritize & Select
M entries
Issue Queuereq
req
req
grant
grant
grant
N-wideB
id f
or is
sue
slot
Bro
adca
st g
ran
t
...
Brown University WCED 2002
Issue Queue Utilization Problem Complexity of bid/grant arbitration logic
increases with size of the IQ IQ consists largely of post-issue instructions Limiting the available ILP that a large IQ is
supposed to provide
Not a complexity-effective design
Brown University WCED 2002
IQ Design Options Increase the IQ size
Improve performance – increase available ILP Increase complexity
Simplify arbitration logic – use slower circuitry Reduce complexity Hurt performance
Reduce IQ size Reduce complexity Hurt performance
Brown University WCED 2002
Double Latency of Issue Queue
-70%
-60%
-50%
-40%
-30%
-20%
-10%
0%Exe1 Exe3 Exe5 Exe7
Pe
rfo
rma
nce
Incr
ea
se F
rom
a 6
4 E
ntr
y Is
sue
Qu
eu
e,
De
pe
nd
en
t Lo
ad
Sp
ecu
latio
n
compress
ijpeg
bzip
Int_avg
apsi
swim
art
wupwize
FP_avg
Brown University WCED 2002
Smaller IQ (48 Entry)
-25%
-20%
-15%
-10%
-5%
0%
5%
Exe1 Exe3 Exe5 Exe7
Pe
rfo
rma
nce
Incr
ea
se F
rom
a 6
4 E
ntr
y Is
sue
Qu
eu
e,
De
pe
nd
en
t Lo
ad
Sp
ecu
latio
n
compress
ijpeg
bzip
Int_avg
apsi
swim
art
wupwise
FP_avg
Brown University WCED 2002
Complexity-Effective Issue Queue Goal
Reduce complexity Do not degrade performance
Solution: The Dual Issue Queue Move post-issue instructions from main queue to
separate replay queue Increase available ILP Reduce size of main IQ
Brown University WCED 2002
Dual Issue Queue
RegisterFile
FunctionalUnits
RegisterRenameUnit
DataCache
MainIssueQueue
ReplayIssueQueue
fromFetchunit
Replay_req
MIQ
RIQ
Brown University WCED 2002
Dual Issue Queue Performance
-8%
-6%
-4%
-2%
0%
2%
4%
6%
8%
10%
Exe1 Exe3 Exe5 Exe7
Pe
rfo
rma
nce
Incr
ea
se F
rom
Sta
nd
ard
Issu
e Q
ue
ue
, D
ep
en
de
nt L
oa
d S
pe
cula
tion
compress
ijpeg
bzip
Int_avg
apsi
swim
art
wupwise
FP_avg
Brown University WCED 2002
Conclusion
Load hit speculation is critical for high performance in deeper pipelines
Larger percentage of post-issue instructions in issue queue
Complexity-effective issue queue scheme addresses utilization problem
For deepest pipelines, overall performance improves while reducing complexity of IQ