+ All Categories
Home > Documents > Reducing the Complexity of the Register File in Dynamic Superscalar Processors

Reducing the Complexity of the Register File in Dynamic Superscalar Processors

Date post: 01-Jan-2016
Category:
Upload: sopoline-tyson
View: 21 times
Download: 4 times
Share this document with a friend
Description:
Reducing the Complexity of the Register File in Dynamic Superscalar Processors. Nathir Rawashdeh University of Massachusetts, Amherst Low Power Architecture, Professor Moritz Note : - PowerPoint PPT Presentation
19
Reducing the Complexity of the Register File in Dynamic Superscalar Processors Rajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchit ecture, pp. 237 –248, MICRO 2001. Nathir Rawashdeh University of Massachusetts, Amherst Low Power Architecture, Professor Moritz Note : This presentation is, to a large extent, a reproduction of slides created buy the School of Electrical Engineering at Korea University. I have altered them and added new slides to better suit my audience. Nathir Rawashdeh (3 November 2003)
Transcript
Page 1: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Reducing the Complexity of the Register File

in Dynamic Superscalar ProcessorsRajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi

In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 237 –248, MICRO 2001.

Nathir Rawashdeh

University of Massachusetts, Amherst

Low Power Architecture, Professor Moritz

Note :

This presentation is, to a large extent, a reproduction of slides created buy the School of Electrical Engineering at Korea University. I have altered them and added new slides to better suit my audience.

Nathir Rawashdeh (3 November 2003)

Page 2: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Contents

Motivation

Reduce register file size Two Level Register File

(1st Technique)

Reduce port complexity Banked Organization

(2nd Technique)

Evaluation Two-Level Register File Evaluation Banked Register File Evaluation Combining the Two Techniques

Page 3: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Motivation Modern high-performance processors use an out-of-order super

scalar core to dynamically extract instruction level parallelism (ILP) from running applications. Examine large window of in-flight instructions to find/issue multiple ready a

nd independent instructions every cycle. A larger instruction window:

– Achieves better ILP

– Requires a larger register file, issue queue, and reorder buffer.

Large multi-ported register file can potentially compromise clock cycle time in future wire-limited technologies.

Suggested two Methods in this Paper: Two-Level Register File Organization to reduce register file size req

uirements. Banked Organization to reduces port complexity.

Page 4: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Motivation

Conventional Register File Organization

• Logical registers are renamed to physical registers

• At 1 and 2 : lr5 is renamed to pr18

• Branch at 3 is predicted not taken -> must keep pr18 in case of misprediction. Lr5 at 5 must be allocated a new reg. pr27

• Pr18 can only released to the free-list after 5 commits. Then lr5 at5 will be remapped to pr27

Page 5: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Two-Level Register File(1st Technique)

Level One (L1) Register File :

Leaves register values that have potential readers.

Level Two (L2) Register File :

Keeps other register values waiting to be released after their instructions commit.

Effects: Reduced register file access time. Because a smaller

portion (L1) of the register file is on the critical path. More energy needed to copy register contents between L1 and L2.

Page 6: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Two-Level Register File Microarchitectural Changes

Assumption : 8-way issue processor During rename, register renames correspond only to L1 Physical register

s, L2 registers are hidden from the rename process.

Page 7: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Two-Level Register File Usage Table

Monitors the usage statistics for each L1 physical registers. Maintaining Information

– Pending consumer counter : keeps track of the number of pending consumers of that value.

Increment : during rename, an instruction that sources the register increments the counter Decrement : during issue, the same instruction decrements the counter or if the instructio

n is squashed after a mispredict.

– Overwrite bit (single bit) Set when the physical register is no longer the latest mapping for its logical register.

(the lr’s mapping changed to a different pr)

– Another “result-written” bit Indicates if a result has been written into the physical register.

– Sequence number counter (sequence number 1) For the branch immediately following the instruction that writes to this physical register.

– Sequence number counter (sequence number 2) For the branch immediately preceding the next instruction that writes to the same logical r

egister.

• Sequence number counter size : log2(ROB size).

Page 8: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Two-Level Register File

Single L2 ID valid bit Added to each ROB entry. Indicates that the destination register ID in that entry corresponds

to an L2 register.

Page 9: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Two-Level Register File

Copy List Keeps track of L1-L2 copies for recovery from a branch mispredict. Maintaining Information for each L2 entry:

– The L1 physical register name that had earlier contained the value.

– The sequence number for the branch immediately following the instruction that writes to this physical register.

– The sequence number for the branch immediately preceding the next instruction that writes to the same logical register.

Two branch sequence numbers stored indicate the live period of a physical register value, the period during which instructions sourcing this value are dispatched.

Page 10: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Minimally-Ported Banked Register File (2nd Technique)

Motivation The large number of register file ports (in a wide-issue processor)

– Increase complexity -> more power consumption

– Increase reg. file access time -> will limit clock speed in future wire-limited technologies.

The number of ports required on average are a lot fewer than the actual port count (that supports the worst case).

Reasons:– Many operands are read off the bypass network, not form the reg. file.

– Many instructions only have a single register operand.

– A number of instructions produce results that are not written to the register file (branches, stores, effective address computation part of a load or store)

Page 11: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Minimally-Ported Banked Register File

Page 12: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Evaluation

Metrics used to evaluate the Two-Level Register File Organization and Banked Register File Organization.

IPC : instructions per cycle IPS : instructions per second = IPC/Access Time Assume register file access time is the bottleneck, IPS is a better

measure than IPC

Page 13: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Two-Level Register File Evaluation IPC (single vs. two-level reg. file)

1.63 Gap between the two lines : Addition of L2 frees up more L1 registers

Two-level organization has IPC = (1.67) with just 80 L1 registers (and 80 L2)

Single-level organization requires as many as 140 registers to attain an IPC of 1.65.

out of 140 physical registers, only about 80 are active at any given time.

Renaming 60 don’t have any consumers unless there is a misprediction or exception and they can be move away to the L2.

-

1.65

Page 14: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Two-Level Register File Evaluation

For single level register file, IPS peaks for a 100-entry register file.

For two-level register file, peak IPS value is seen for 60-entry L1.

Optimal IPS with two-level organization is 17% better than the optimal IPS with a single-level register file ( better access time with two-level design).

IPS (single vs. two-level reg. file)

max

max

Page 15: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Two-Level Register File Evaluation IPS on individual applications.

The 100-L1 has the longest access time, but it’s IPS is not always worse than the 60-L1. In those cases, the 100-L1’s IPC out ways the access time penalty.

Two-level organization achieves best IPS because it maintains low access time and an IPC comparable (within 1%) to the single-level 100-L1 design.

Page 16: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Banked Register File Evaluation Reg. file with a single read and single write port with N banks.

Base Case: “Single bank,4rd,4wr” is within 2% of 24-ported case

Third Bar : penalty by conflicts for read ports. 1% IPC degradation

Fourth Bar : additional penalty by write port conflicts. 5% IPC degradation

Worst port contention for apps with high ILP

Page 17: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Banked Register File Evaluation Reducing conflicts move from 4 to 8 banks

With 8 banks -> almost no IPC degradation due to read/write port conflicts

(compared to 4 banks in previous figure)

Still 2% IPC loss over 24-ported design

Page 18: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Combining the Two Techniques

Page 19: Reducing the Complexity of the Register File  in Dynamic Superscalar Processors

Summary of various Organizations

Two-level organization has slightly lower IPC than single-level, but 17% better IPS due to shorter L1 access times. Energy penalty due to copying between L1 & L2.

Banked (single port per bank) reg. file has shorter access time (>2 factor) and needs 18 times less energy than a conventional organization.

The Choice of technique dependant on design goals


Recommended