Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | marcia-salas |
View: | 26 times |
Download: | 0 times |
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton UniversityM. Franklin – University of Maryland
Presented by: Deniz Balkan
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Dynamic Scheduler
• Workings of a dynamic scheduler– Wakeup dependent instructions
– Select instructions from a pool of ready instructions
• Both these operations form a critical path
• Increase of a single cycle in this critical path impacts performance
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Implications of a large Dynamic Scheduler
• Large dynamic scheduler has the potential to exploit more ILP
– Larger issue queue– Larger issue width
• Implications– Longer wire delays associated with driving register tags– Longer wire delays in driving tag comparison results– Longer select logic latency
• Overall increased scheduler latency, resulting in slower clock speed
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Contributions of this paper
• Wakeup width definition – effective number of results used for instruction wakeup
– Usually equal to the issue width
• Reduced wakeup width dynamic scheduler– Issue width remains the same
– Reduces instruction wakeup latency, energy consumption, and area
– Less than 2% reduction in IPC
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Program Behavior Study
• Not all instructions produce a result– Branch and store instructions form about 30%
• Entire issue width of the processor not used in every cycle
• Average number of tags generated per cycle considerably less than the processor issue width
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Tags generated in a cycle
• To generate more tags per cycle, used a fetch, issue and commit width of 12
• Almost 50% of cycles have either 0 or 1 tag generated, even with a large issue width
• About 80% of the cycles have 3 or less tags generated per cycle
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Useful tags
• Not all the generated tags are immediately useful
– Branch mispredictions lead to tags generated along wrong path, and tags not immediately required
– Dependent instructions not present in issue queue or waiting for other operands
• Average number of useful tags in a cycle even less than the average number of tags generated in a cycle
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Useful tags
Only about 50-60% of instructions produce a tag that is immediately required
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reduced Wakeup Width Dynamic Scheduler
• Wakeup width reduced while retaining the issue width intact
– Some tags may have to wait before waking up the dependent instructions
• Performance impact is not expected to be high
– Soon there will be cycles with fewer tags
– Waiting tags can use the available wakeup slots
– Delays in not immediately useful tags may not have any performance impact
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Hardware Implementation – Conventional DS
• Select logic decides which instruction executes on which FU
• Register tags of issued instructions placed in tag-latches
• Enable signals controlled to enable the drivers that drive the tags across the instruction window
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Hardware Implementation – RWW DS
• Wakeup width reduced to half the issue width
• Two tag latches/FUs share common tag-lines
• If both tag-latches hold tags, only one of them is driven, the other remains in the tag-latch
• To prevent overwriting, 1-bit indicator latch used to control the selection process
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
FU arbiter
• Decides the instruction to be executed on the FU
• Conventional arbiter giving priority to oldest instruction
• Arbiter with RWW dynamic scheduler, where “a” is the value of the indicator latch for the arbiter
Grant1 = req0 AND req1 AND enable
Grant1 = req0 AND a AND req1 AND enable
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Experimental Setup
• Simulator based on Simplescalar to collect the performance statistics
• Delay, energy, and area estimation from the actual VLSI layouts using SPICE, in a 0.18 micron 6 metal layer CMOS process (TSMC)
• Dynamic scheduler size – 128-entry issue queue, 6-way issue width
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Performance Results
• Compared to I6W6 (Issue Width 6, Wakeup Width 6) configuration
– I6W3 has 15% lower wakeup logic latency
• IPC impact about 5% for I6W3– Higher for high IPC FP benchmarks
– Significantly better than I3W3, with the same wakeup logic latency as I6W3
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
IPC of FP benchmarks with RWW
Reasons of IPC impact• Instructions delayed due to waiting tags• Issue slots wasted because of waiting tags
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reasons of IPC impact
• Delayed register tags have more impact than issue slot wastage
• With reducing wakeup width, the impact of delayed register tags increases dramatically
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Area and Energy Results
• Activation statistics obtained through simulations, and the energy consumption values from our detailed layouts
– I6W3 reduced wakeup logic energy consumption by 10%
• Area of the CAM cells (tag part of the instruction window) reduces by about 30% for I6W3
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reduced Issue Slots Wastage (RWIS)
• Issue slots wasted because no instructions issued to FUs with already waiting tags
• Classified instructions into– Tag-producing instructions– Non-tag-producing instructions
• Can still issue non-tag-producing instructions to FUs with waiting tags without overwriting the tag value
• Type bit included with the instruction to control issue
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reduced Tag Delays (RTD)
• Register tags delayed when multiple tag-producing instructions issued to the FUs sharing the tag-lines (FU-group)
• RTD limits the number of tag-producing instructions issued to an FU-group
– Waiting tags of the previous cycle used for this purpose
• Non-tag-producing instructions can still be issued to FUs with indicator bits set
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Enhanced Performance
• RTD-1 (with a maximum of 1 waiting tag) is the most effective
• RWIS reduces the wastage of issue slots, RTD also reduces waiting register tags
• RTD-2 results in more instructions getting delayed (compared to RTD-1) due to waiting register tags
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Conclusions
• Larger dynamic schedulers can exploit more ILP, thus increasing performance
• Larger dynamic scheduler results in longer scheduler latency
• Reduced wakeup width (RWW) dynamic scheduler exploits the property that the number of useful tags generated per cycle are significantly less than the issue width
• Significant reduction in wakeup logic latency and dynamic scheduler area and energy consumption with minimal IPC impact