Wed830 Riscy Processors V2 - RISC-V Foundation · n Typically close to reverse pipeline order, ......

Riscy ProcessorsA collection of open-sourced RISC-V processors

Andy Wright, Sizhuo Zhang, Thomas Bourgeat, Murali Vijayaraghavan, Jamey Hicks, ArvindComputation Structures Group, CSAIL, MIT4th RISC-V Workshop

1July 25, 2016

MIT’s Riscy ExpeditionAdvisors Profs. Arvind and Adam ChlipalaMotivations:n Formal Specificationn Formally Verified Processor Implementationsn Memory Consistency Modelsn Acceleratorsn Microarchitectural Explorationn ASIC Synthesis

2July 25, 2016

⇒ need flexible RISC-V Implementations

Chips with Proofs

3July 25, 2016

Full RISC-V Chip

Full Chip Prooflots of effort

required

RISC-V Modules

Modular Proofs

Less effort required

Build up processors and proofs modularly to reduce design and proof effort

Currnet Riscy OfferingsBuilding Blocks for Processor Design:n Riscy Processor Libraryn Riscy BSV Utility Library

Reference Processor Implementations:n Multicyclen In-Order Pipelinedn Out-of-Order Execution (to be released shortly)n All implementations boot Linux w/ paged virtual memory

Infrastructure:n Connectaln Tandem Verification

4July 25, 2016

A flexible way of designing processors leveraging Bluespec System Verilog (BSV)

Processor Building Blocks

5July 25, 2016

Riscy Proc Library

CSR File

RegFile

ALU

FPU Multiply

Divide

Decode

MMU Cache

MemOpBridge

Riscy Util Library

ConcatReg

PerfMonitor

EHR

Print Trace FIFOG

Server Util

Search FIFO

RW BRAM

Connecting Modules

6July 25, 2016

Method Definition

External Method Call

Connections through glue logic

Direct connection

Modules can act as glue logic

Modules with external method calls are open modules. Those without are closed modules.

Initial ProcessorMulticycle Basic – FSM without Caches

7July 25, 2016

MMU MemOpBridge

FenceReq

CSR File

RegFile ALU

FPU

Memory System

MemOpBridge

Main Memory IO

MMU

Decode

Core

extra logic

VMInfo

Initial ProcessorMulticycle – Adding Caches and TLBs

8July 25, 2016

MMU MemOpBridge

FenceReq

Memory System

MemOpBridge

Main Memory IO

MMUITLB I$ DTLB D$

CSR File

RegFile ALU

FPU

Decode

Core

extra logic

VMInfo

Initial ProcessorMulticycle Split – Front and Back FSMs

9July 25, 2016

DTLB D$FenceReq

CSR File

VMInfo

RegFile ALU

FPU

Memory System

I$

Main Memory IO

ITLB

Decode

Front End

extralogic

extralogic

Back End

Pipelined Register Read, Execute, Memory, and Write

BackPipelined Fetch, Branch Prediction, and Decode

Processor ImplementationsIn-Order Pipelined

10July 25, 2016

Memory System

Main Memory IO

Front End Back End

Pipelined Register Read, Execute, Memory, and Write

BackPipelined Fetch, Branch Prediction, and Decode

Processor ImplementationsOut-of-Order Execution

11July 25, 2016

Memory System

Main Memory IO

Register Renaming, Issue, Dispatch, Execute, and Commit

Front End Back End

Processor ImplementationsOut-of-Order Execution – Back-End

12July 25, 2016

RegRenam-

ing

Execute Rules

Finish

DTLB D$

CommitReorder Buffer

Reservation Station

Ld/St Q

CSR File

Execute Rules

Execute Rules Physical

Reg File

How is modular design possible?RTL modules are not modularly refinable under compositionn Implementation details of one module may

put additional constraints on another module

Bluespec System Verilog supports composability through:n Interface method abstractionn Implicit guards on methodsn Guarded Atomic Actions

13July 25, 2016

Bluespec System Verilog (BSV)High-Level Synthesis languagen Execution model built upon guarded atomic

actions (called rules)n Rule can only change the state of a module if its

guard (or condition) is trueThe compiler adds logic to specify when rules firen Rules must have its explicit guard and all

implicit guards for module method calls satisfiedn Rules must appear to fire atomically (or

sequentially – one-rule-at-a-time semantics)

14July 25, 2016

FIFO InterfaceInterface FIFO#(type t);

method Action enq(t x);method t first;method Action deq;

endinterface

15July 25, 2016

FIFOenq first

deq

Implicit guards prevent enqueuing into a full FIFO or dequeuing from an empty FIFO

Connecting Modules with Rules

16July 25, 2016

Buffer1

Buffer2deq enq

kill kill

A packet will be killed either before the move in buffer 1, or after the move in buffer 2Suppose neither buffer kills the current enqueuing/dequeuing packet n Atomicity bug in Verilogn Bluespec compiler introduces extra logic to prevent

concurrent move and kill

Movepkt B1 to B2

Killpkt w/ id

Processor Design Flow

17July 25, 2016

Riscy Blocks

Initial Connections

Modular Refinement BSV Scheduling logic may be hurting performance

EHR transformations for performance

Specialized processor for desired application

Scheduling Optimization

focus on functionality

Scheduling OptimizationIntroduction

18July 25, 2016

Buffer1

Buffer2deq enq

kill kill

To enable Move and Kill to fire concurrently, they need an apparent ordering (or schedule)n If the buffer’s methods don’t support the schedule,

use EHR refinement within the buffers to achieve the necessary scheduling

Movepkt B1 to B2

Killpkt w/ id


method deq if (r != invalid);r <= invalid;return r;

method kill(tag);if (r.tag == tag)

r <= invalid;

19July 25, 2016

deq

kill

Reg rr w

Buffer 1


method deq if (r[1] != invalid);r[1] <= invalid;return r[1];

method kill(tag);if (r[0].tag == tag)

r[0] <= invalid;

20July 25, 2016

deq

kill

Ehr rr w

Buffer 1

r w

For more information about EHRs, see “The Ephemeral History Register: Flexible Scheduling for Rule-Based Designs” by Daniel Rosenband

Processor Interface

21July 25, 2016

mkProc External Interrupt

Control/Configuration

Get Verification Packet

MMIO Req/Resp

HTIF ToHost/FromHost

Main Memory Req/Resp

Processor Devices, Accelerators, and Infrastructure

DebuggingMethods

Connectal InfrastructurePCIe FPGA Boards

22July 25, 2016

VC707

FPGA

RV64G BSV

PCIe

Connectal implements this connection

DRAM

HTIF/Fesvr

Spike Tandem

Verification

Host Computer

Test Program Loading

Device Emulation

Connectal InfrastructureSimulation

23July 25, 2016

HTIF/Fesvr

Spike Tandem

Verification

Host Computer


Device Emulation

Host Computer

Bluesim

RV64G BSV

Connectal implements this connection

Unix Socket

DRAM

Connectal InfrastructureZynq Chips

24July 25, 2016

HTIF/Fesvr

Spike Tandem

Verification

Zynq ARM core


Device Emulation

ARM core running same code as host in previous slide

AxiZynq FPGA Fabric

RV64G BSV

DRAM

Tandem Verification

Run the same program on two RISC-V implementations at oncen Generate verification packets at commit stagen Use non-deterministic information from the

implementation under test for synchronizationn Compare results

25July 25, 2016

Riscy Proc SpikeVerification Packets

pc, instruction, data, exceptions, etc.

Synchronize

Simulate

Compare

Commit Stage

Project StatusAvailable now: n Riscy Processor Libraryn Riscy BSV Utility Libraryn Example multicycle processorsn https://github.com/csail-csg/riscy

Coming soon:n High-performance In-Order processorn Out-of-Order processor and modulesn Multicore processors

Planned work:n Formal specificationsn Proofs for modulesn Proofs for processors

26July 25, 2016

Questions?

27July 25, 2016

Backup Slides

28July 25, 2016

Scheduling OptimizationIn Processor Design Flow

Choosing Schedulesn Typically close to reverse pipeline order, but

very dependent on microarchitectureAdapting modules to schedulen Registers that prevent desired schedule

should be replaced with EHRsn EHRs add bypass paths between rules

without breaking atomicity

29July 25, 2016

For more information about EHRs, see “The Ephemeral History Register: Flexible Scheduling for Rule-Based Designs” by Daniel Rosenband

Scheduling OptimizationConsequences of Schedule Choice

Location of concurrency logicn In the Move and Kill example, which buffer is in charge

of killing moving instructions?n One location may be more efficient than another

Existence of bypassing logic to skip latencyn Scheduling can allow some redirecting rules to update

the fetch pc in the same cycle while other redirecting rules update the fetch pc for the next cycle

Priority between consumers of shared resourcesn Scheduling can determine priority for ports in an

arbiters

30July 25, 2016

Scheduling OptimizationOut-of-Order Execution Example

31July 25, 2016

Busy Table

ReservationStation

Finish

Insert

RegFile

Execute

To functional unit

From functional unit

Ordering of Insert and Finish determines where bypass path is implemented

Ordering of Insert and Execute determines if an instruction can be executed in the same cycle it inserts the reservation station

Ordering of Execute and Finish determines if an instruction can be executed in the same cycle its registers become ready

Connecting Modules

32July 25, 2016

Buffer1

Increment

Move value Buffer 2deq enq

inc inc

The implementations of these two buffers are not independent in typical HDLs. This prevents modular design/refinement.

+0 +1+1 +2

Buffer 2 enq behavior

Buffer 1 deqbehavior

don’t increment incrementdon’t increment

increment

Date post:	18-Aug-2018
Category:	Documents
Upload:	phamdung
View:	229 times
Download:	0 times

Wed830 Riscy Processors V2 - RISC-V Foundation · n Typically close to reverse pipeline order, ......

Documents