+ All Categories
Home > Documents > Closely-Coupled Timing-Directed Partitioning in HAsim

Closely-Coupled Timing-Directed Partitioning in HAsim

Date post: 08-Feb-2016
Category:
Upload: denise
View: 28 times
Download: 2 times
Share this document with a friend
Description:
Closely-Coupled Timing-Directed Partitioning in HAsim. Michael Pellauer † [email protected]. Murali Vijayaraghavan † , Michael Adler ‡ , Arvind † , Joel Emer †‡. † MIT CS and AI Lab Computation Structures Group. ‡ Intel Corporation VSSAD Group. To Appear In: ISPASS 2008. Motivation. - PowerPoint PPT Presentation
27
Closely-Coupled Timing-Directed Partitioning in HAsim Michael Pellauer [email protected] Murali Vijayaraghavan , Michael Adler , Arvind , Joel Emer †‡ MIT CS and AI Lab Computation Structures Group Intel Corporation VSSAD Group To Appear In: ISPASS 2008
Transcript
Page 1: Closely-Coupled Timing-Directed Partitioning in HAsim

Closely-CoupledTiming-Directed Partitioning

in HAsim

Michael Pellauer†

[email protected] Vijayaraghavan†, Michael Adler‡, Arvind†, Joel Emer†‡

†MIT CS and AI LabComputation Structures Group

‡Intel CorporationVSSAD Group

To Appear In: ISPASS 2008

Page 2: Closely-Coupled Timing-Directed Partitioning in HAsim

MotivationWe want to simulate target platforms quicklyWe also want to construct simulators quicklyPartitioned simulators are a known technique from traditional performance models:

• ISA• Off-chipcommunication

• Micro-architecture• Resource contention• Dependencies

Interaction

• Simplifies timing model• Amortize functional model design effort over many models• Functional Partition can be extremely FPGA-optimized

TimingPartition

FunctionalPartition

Page 3: Closely-Coupled Timing-Directed Partitioning in HAsim

Different Partitioning SchemesAs categorized by Mauer, Hill and Wood:

Source: [MAUER 2002], ACM SIGMETRICSWe believe that a timing-directed solution will ultimately lead to the best performance

Both partitions upon the FPGA

Page 4: Closely-Coupled Timing-Directed Partitioning in HAsim

Functional Partition in Software AsimGet Instruction (at a given Address)Get DependenciesGet Instruction ResultsRead Memory*

Speculatively Write Memory* (locally visible)Commit or Abort instructionWrite Memory* (globally visible)

* Optional depending on instruction type

Page 5: Closely-Coupled Timing-Directed Partitioning in HAsim

Execution in Phases

F D X R C

F D X W C W

F D X C

The Emer Assertion:

All data dependencies can be represented via these phases

F D X R A

F D X X C W

Page 6: Closely-Coupled Timing-Directed Partitioning in HAsim

Detailed Example: 3 Different Timing Models

Executing the same instruction sequence:

Page 7: Closely-Coupled Timing-Directed Partitioning in HAsim

Functional Partition in Hardware?Requirements

Support these operations in hardwareAllow for out-of-order execution, speculation, rollback

ChallengesMinimize operation execution timesPipeline wherever possibleTradeoff between BRAM/multiport RAMsRace conditions due to extreme parallelism

Page 8: Closely-Coupled Timing-Directed Partitioning in HAsim

Functional Partition As Pipeline

Conveys concept well, but poor performance

Token Gen Dec Exe Mem LCom GComFet

Timing Model

MemoryState

Register State

RegFile

FunctionalPartition

Page 9: Closely-Coupled Timing-Directed Partitioning in HAsim

Implementation:Large Scoreboards in BRAM

Series of tables in BRAM

Store information about each in-flight instructionTables are indexed by “token”

Also used by the timing partition to refer to each instructionNew operation “getToken” to allocate a space in the tables

Page 10: Closely-Coupled Timing-Directed Partitioning in HAsim

Implementing the Operations

See paper for details (also extra slides)

Page 11: Closely-Coupled Timing-Directed Partitioning in HAsim

Assessment:Three Timing Models

Unpipelined Target

MIPS R10K-like out-of-order superscalar

5-Stage Pipeline

Page 12: Closely-Coupled Timing-Directed Partitioning in HAsim

Assessment:Target Performance

Targets have idealized memory hierarchy

Target Processor CPI

0

0.5

1

1.5

2

2.5

3

3.5

median multiply qsort towers vvadd average

Mod

el C

ycle

s pe

r Ins

truct

ion

(CPI

)

Unpipelined5-stageOut-of-Order

Page 13: Closely-Coupled Timing-Directed Partitioning in HAsim

Assessment:Simulator Performance

Some correspondence between target and functional partition is very helpful

Simulation Rate

0

5

10

15

20

25

30

35

40

45

median multiply qsort towers vvadd average

FPG

A-C

ycle

s pe

r Mod

el C

ycle

(FM

R)

Unpipelined5-StageOut-of-Order

Page 14: Closely-Coupled Timing-Directed Partitioning in HAsim

Assessment:Reuse and Physical Stats

Where is functionality implemented:

FPGA usage:

Design IMem ProgramCounter

Branch Predictor

Scoreboard/ROB

RegFile

Maptable/Freelist

ALU DMem Store Buffer

Snapshots/Rollback

Functional Partition

Unpipelined N/A N/A N/A N/A N/A

5-Stage N/A

Out-of-Order

Unpipelined 5-stage Out of Order

FPGA Slices 6599 (20%) 9220 (28%) 22,873 (69%)

Block RAMs 18 (5%) 25 (7%) 25 (7%)

Clock Speed 98.8 MHz 96.9 MHz 95.0 MHz

Average FMR 41.1 7.49 15.6

Simulation Rate 2.4 MHz 14 MHz 6 MHz

Average Simulator IPS

2.4 MIPS 5.1 MIPS 4.7 MIPS

Virtex IIPro 70

Using ISE 8.1i

Page 15: Closely-Coupled Timing-Directed Partitioning in HAsim

Future Work:Simulating Multicores

Scheme 1: Duplicate both partitions

Scheme 2: Cluster Timing Parititions

TimingModel

A

FuncReg +

Datapath

TimingModel

B

FuncReg +

Datapath

FuncReg +

Datapath

TimingModel

C

FuncReg +

Datapath

TimingModel

D

FunctionalMemory

State

TimingModel

A

TimingModel

B

TimingModel

C

TimingModel

D

FunctionalReg State +

Datapath

FunctionalMemory

State

Interactionoccurshere

Interactionstill occurs

here

Use a context IDto reference all state

lookups

Page 16: Closely-Coupled Timing-Directed Partitioning in HAsim

Future Work: Simulating MulticoresScheme 3: Perform multiplexing of timing models themselves

Leverage HASim A-Ports in Timing ModelOut of scope of today’s talk

TimingModel

D

FunctionalReg State +

Datapath

FunctionalMemory

StateInteractionstill occurs

here

Use a context IDto reference all state

lookups

TimingModel

C

TimingModel

B

TimingModel

A

Page 17: Closely-Coupled Timing-Directed Partitioning in HAsim

UT-FAST is Functional-First

This can be unified into Timing-DirectedJust do “execute-at-fetch”

Future Work:Unifying with the UT-FAST model

FuncPartition

TimingPartition

EmulatorØØØ

Ø

functionalemulatorrunning insoftware

FPGA

execution stream

resteer

execution stream

resteer

functionalemulatorrunning insoftware

Page 18: Closely-Coupled Timing-Directed Partitioning in HAsim

SummaryDescribed a scheme for closely-coupled timing-directed partitioning

Both partitions are suitable for on-FPGA implementation

Demonstrated such a scheme’s benefits:Very Good Reuse, Very Good Area/Clock SpeedGood FPGA-to-Model Cycle Ratio:

Caveat: Assuming some correspondence between timing model and functional partitions (recall the unpipelined target)

We plan to extend this using contexts for hardware multiplexing [Chung 07]Future: rare complex operations (such as syscalls) could be done in software using virtual channels

Page 19: Closely-Coupled Timing-Directed Partitioning in HAsim

Questions?

[email protected]

Page 20: Closely-Coupled Timing-Directed Partitioning in HAsim

Extra Slides

[email protected]

Page 21: Closely-Coupled Timing-Directed Partitioning in HAsim

Functional Partition Fetch

Page 22: Closely-Coupled Timing-Directed Partitioning in HAsim

Functional Partition Decode

Page 23: Closely-Coupled Timing-Directed Partitioning in HAsim

Functional Partition Execute

Page 24: Closely-Coupled Timing-Directed Partitioning in HAsim

Functional Partition Back End

Page 25: Closely-Coupled Timing-Directed Partitioning in HAsim

Timing Model: Unpipelined

Page 26: Closely-Coupled Timing-Directed Partitioning in HAsim

5-Stage Pipeline Timing Model

Page 27: Closely-Coupled Timing-Directed Partitioning in HAsim

Out-Of-Order Superscalar Timing Model


Recommended