+ All Categories
Home > Documents > Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

Date post: 21-Jan-2016
Category:
Upload: michi
View: 19 times
Download: 0 times
Share this document with a friend
Description:
Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors. Vinod Viswanath The University of Texas at Austin. Outline. Power Dissipation in Hardware Circuits Instruction-driven Slicing to attain lower power dissipation - PowerPoint PPT Presentation
Popular Tags:
30
Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors Vinod Viswanath The University of Texas at Austin
Transcript
Page 1: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

Vinod Viswanath

The University of Texas at Austin

Page 2: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

2

Outline

• Power Dissipation in Hardware Circuits

• Instruction-driven Slicing to attain lower power dissipation • Automatically annotates microprocessor

description

• At the Register Transfer Level and Architectural level

• Applying Instruction-driven Slicing to pipelined architectures

• Applying Instruction-driven Slicing to out-of-order superscalar architectures

Page 3: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

3

Power Dissipation

• Switching activity power dissipation• To charge and discharge nodes

• Short Circuit power dissipation• High only for output drivers, clock buffers

• Static power dissipation• Due to leakage current

Page 4: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

4

Switching Activity Power Dissipation

• Transistor-level• Reordering, sizing

• Gate-level• Don’t-care optimizations (combinational)

• Encoding (sequential)

• Pre-computation based optimization (sequential)

• Guarded evaluation (sequential)

• RT-level• Use program structure and dataflow

information available at that level of abstraction

Page 5: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

5

Instruction-driven Slice

• An instruction-driven slice of a microprocessor design is • all the relevant circuitry of the design required

to completely execute a specific instruction

• Parts of the decode, execute, writeback etc. blocks

• Cone of influence of the semantics of the instruction

Page 6: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

6

Instruction-driven Slicing

• Given a microprocessor design and an instruction• Identify the instruction-driven slice

• Shut off the rest of the circuitry

• This might include• Gating out parts of different blocks

• Gating out floating point units during integer ALU execution

• Turning off certain FSMs in different control blocks since exact constraints on their inputs are available due to instruction-driven slicing

Page 7: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

7

Algorithm (High Level)

• Algorithm instruction-driven-slicing. Begin

• Inputs: vRTL (Verilog RTL), insts (instructions)• Output: aRTL (Annotated RTL)• Parse vRTL to obtain the Abstract Syntax Program Graph

(ASPG)• For each instruction I in insts repeat

– Slice the ASPG for instruction I– Traverse the ASPG– Add annotation variables if such a block is found– If a particular flop is already gated, then add the current annotation in an optimal fashion– Return the annotated ASPG

• Generate Verilog code (aRTL) for the annotated ASPG

End.

Page 8: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

8

or1200_ctrl.lsu_op

Page 9: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

9

Methodology

• In order to demonstrate our technique• We have incorporated instruction-driven slicing as part of

the traditional design flow

• The vRTL model is annotated to obtain the aRTL model

• Synopsys Design Environment has been sufficiently modified to accept the aRTL, SPEC2000 benchmarks and power process parameters and estimate the power dissipation due to switching activity

• The annotated Architectural model is fed to the SimpleScalar simulator with the Wattch power estimator to estimate the power dissipation

Page 10: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

10

Methodology

Page 11: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

11

Experiment: OR1200

• We have used our tool-chain to test our methodology on OR1200• OR1200 is a single-instruction-issue pipelined

microprocessor implementing the OpenRISC ISA.

• 4-stage integer pipeline with single instruction issue per cycle

• We have annotated both the RTL and the architectural models of OR1200

Page 12: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

12

Experiment: OR1200

Page 13: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

13

OR1200-RTL Results

• Results are shown after annotation insertion• Sliced on 1, 4, 10 instructions

• For SPECINT2000 benchmarks

• Power dissipation decreases consistently

Page 14: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

14

OR1200-Arch Results

• Results are shown after annotation insertion• Sliced on 1, 4, 10 instructions

• For SPECINT2000 benchmarks

• Power dissipation decreases consistently

Page 15: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

15

OR1200 Results (contd.)

• Power gains are consistently good• Power gains far outperform area losses

Page 16: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

16

• Flop distribution shown before slicing (Fig. a) after slicing on add, l.add (Fig. b) and after slicing on load, l.lw (Fig. c)

Fig. a Fig. b Fig. c

OR1200 Results (contd.)

Page 17: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

17

Experiment: PUMA

• We have used our tool-chain to test our methodology on PUMA• PUMA is a dual-issue, out-of-order super-

scalar, fixed-point PowerPC core

• We have annotated both the RTL and the architectural models of PUMA

Page 18: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

18

PUMA Results (contd.)

• Power gains are good upon slicing for a few instructions (~7) before delay losses start dominating (Fig. 1)

• Power gains far outperform area losses (Fig 2)• Flop distribution shown before slicing (Fig. 3a) after

slicing on add (Fig. 3b) and after slicing on load (Fig. 3c)

Fig.3a

Fig.3b

Fig.3c

PUMA-RTL Power vs. Delay

0

0.2

0.4

0.6

0.8

1

1.2

Instruction-driven slicing

%-a

ge

Po

wer

gai

n,

Are

a lo

ss

Power

Delay

PUMA-RTL Power vs. Area

0.85

0.9

0.95

1

1.05

1.1

1.15

Instruction-driven slicing

%-a

ge P

ow

er

gain

, A

rea l

oss

Power

Area

(Fig. 1)

(Fig. 2)

Page 19: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

19

Conclusions

• Proposed Instruction-driven Slicing as a new technique to automatically reduce power dissipation

• Implemented the methodology of incorporating instruction-driven slicing into the design flow tool-chain

• Inserting these annotations preserves the functionality of the circuit

Page 20: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

20

Conclusions (continued)

• This technique seems most applicable to single-issue multi-staged pipelined machines.

• When there are multiple instructions in-flight in the same pipeline stage, the gains of a single-instruction-abstraction are lost.

• Graphics processors, various embedded applications are more often better suited for this technique than general purpose out-of-order superscalars.

Page 21: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

21

Spare slides

Page 22: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

22

PUMA: a fixed point PowerPC core

Page 23: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

23

PUMA Power Gain Results

• Results are shown after annotating the• RTL (left) and Architectural (Right) models

• For un-sliced and sliced on 1, 4, 10 instructions

• For SPECINT2000 benchmarks

• Power dissipation decreases consistently

Page 24: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

24

Comparing OR1200 and PUMA

Page 25: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

25

Correct Annotations

• Notion of correctness• Original RTL and the annotated RTL should be

functionally equivalent under all conditions

• Correctness theorem(defthm or1200_slicing_correct

(equal (or1200_cpu n)

(or1200_cpu_sliced n)))

Page 26: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

26

ACL2 Theorem Prover

• First order logic general purpose theorem prover

• Breakdown the theorem into sub-goals

• Many engines work on the sub-goals and will either prove them or break them down further and add to the central pool of goals to be proved

• Success story in Hardware• Verified FDIV in the AMD processors

Page 27: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

27

Proof Methodology

Page 28: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

28

Proof Methodology

• The RTL is a shallow embedding in ACL2

• Convert Verilog RTL into ACL2RTL

• We have created a large RTL library to recognize as well as analyze ACL2RTL

• Slicing is done on the Verilog code

• Both original and annotated Verilog are converted into ACL2 and we construct the functional equivalence proof in ACL2

Page 29: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

29

Verilog to ACL2

Page 30: Automatic Insertion of Low Power Annotations in RTL for Pipelined Microprocessors

30

Proof Structure

• Create a library of functions to interpret the ACL2 model of the RTL

• Functional equivalence theorem is built up block by block


Recommended