Riscy ProcessorsA collection of open-sourced RISC-V processors
Andy Wright, Sizhuo Zhang, Thomas Bourgeat, Murali Vijayaraghavan, Jamey Hicks, ArvindComputation Structures Group, CSAIL, MIT4th RISC-V Workshop
1July 25, 2016
MIT’s Riscy ExpeditionAdvisors Profs. Arvind and Adam ChlipalaMotivations:n Formal Specificationn Formally Verified Processor Implementationsn Memory Consistency Modelsn Acceleratorsn Microarchitectural Explorationn ASIC Synthesis
2July 25, 2016
⇒ need flexible RISC-V Implementations
Chips with Proofs
3July 25, 2016
Full RISC-V Chip
Full Chip Prooflots of effort
required
RISC-V Modules
Modular Proofs
Less effort required
Build up processors and proofs modularly to reduce design and proof effort
Currnet Riscy OfferingsBuilding Blocks for Processor Design:n Riscy Processor Libraryn Riscy BSV Utility Library
Reference Processor Implementations:n Multicyclen In-Order Pipelinedn Out-of-Order Execution (to be released shortly)n All implementations boot Linux w/ paged virtual memory
Infrastructure:n Connectaln Tandem Verification
4July 25, 2016
A flexible way of designing processors leveraging Bluespec System Verilog (BSV)
Processor Building Blocks
5July 25, 2016
Riscy Proc Library
CSR File
RegFile
ALU
FPU Multiply
Divide
Decode
MMU Cache
MemOpBridge
Riscy Util Library
ConcatReg
PerfMonitor
EHR
Print Trace FIFOG
Server Util
Search FIFO
RW BRAM
Connecting Modules
6July 25, 2016
Method Definition
External Method Call
Connections through glue logic
Direct connection
Modules can act as glue logic
Modules with external method calls are open modules. Those without are closed modules.
Initial ProcessorMulticycle Basic – FSM without Caches
7July 25, 2016
MMU MemOpBridge
FenceReq
CSR File
RegFile ALU
FPU
Memory System
MemOpBridge
Main Memory IO
MMU
Decode
Core
extra logic
VMInfo
Initial ProcessorMulticycle – Adding Caches and TLBs
8July 25, 2016
MMU MemOpBridge
FenceReq
Memory System
MemOpBridge
Main Memory IO
MMUITLB I$ DTLB D$
CSR File
RegFile ALU
FPU
Decode
Core
extra logic
VMInfo
Initial ProcessorMulticycle Split – Front and Back FSMs
9July 25, 2016
DTLB D$FenceReq
CSR File
VMInfo
RegFile ALU
FPU
Memory System
I$
Main Memory IO
ITLB
Decode
Front End
extralogic
extralogic
Back End
Pipelined Register Read, Execute, Memory, and Write
BackPipelined Fetch, Branch Prediction, and Decode
Processor ImplementationsIn-Order Pipelined
10July 25, 2016
Memory System
Main Memory IO
Front End Back End
Pipelined Register Read, Execute, Memory, and Write
BackPipelined Fetch, Branch Prediction, and Decode
Processor ImplementationsOut-of-Order Execution
11July 25, 2016
Memory System
Main Memory IO
Register Renaming, Issue, Dispatch, Execute, and Commit
Front End Back End
Processor ImplementationsOut-of-Order Execution – Back-End
12July 25, 2016
RegRenam-
ing
Execute Rules
Finish
DTLB D$
CommitReorder Buffer
Reservation Station
Ld/St Q
CSR File
Execute Rules
Execute Rules Physical
Reg File
How is modular design possible?RTL modules are not modularly refinable under compositionn Implementation details of one module may
put additional constraints on another module
Bluespec System Verilog supports composability through:n Interface method abstractionn Implicit guards on methodsn Guarded Atomic Actions
13July 25, 2016
Bluespec System Verilog (BSV)High-Level Synthesis languagen Execution model built upon guarded atomic
actions (called rules)n Rule can only change the state of a module if its
guard (or condition) is trueThe compiler adds logic to specify when rules firen Rules must have its explicit guard and all
implicit guards for module method calls satisfiedn Rules must appear to fire atomically (or
sequentially – one-rule-at-a-time semantics)
14July 25, 2016
FIFO InterfaceInterface FIFO#(type t);
method Action enq(t x);method t first;method Action deq;
endinterface
15July 25, 2016
FIFOenq first
deq
Implicit guards prevent enqueuing into a full FIFO or dequeuing from an empty FIFO
Connecting Modules with Rules
16July 25, 2016
Buffer1
Buffer2deq enq
kill kill
A packet will be killed either before the move in buffer 1, or after the move in buffer 2Suppose neither buffer kills the current enqueuing/dequeuing packet n Atomicity bug in Verilogn Bluespec compiler introduces extra logic to prevent
concurrent move and kill
Movepkt B1 to B2
Killpkt w/ id
Processor Design Flow
17July 25, 2016
Riscy Blocks
Initial Connections
Modular Refinement BSV Scheduling logic may be hurting performance
EHR transformations for performance
Specialized processor for desired application
Scheduling Optimization
focus on functionality
Scheduling OptimizationIntroduction
18July 25, 2016
Buffer1
Buffer2deq enq
kill kill
To enable Move and Kill to fire concurrently, they need an apparent ordering (or schedule)n If the buffer’s methods don’t support the schedule,
use EHR refinement within the buffers to achieve the necessary scheduling
Movepkt B1 to B2
Killpkt w/ id
Scheduling Optimization
method deq if (r != invalid);r <= invalid;return r;
method kill(tag);if (r.tag == tag)
r <= invalid;
19July 25, 2016
deq
kill
Reg rr w
Buffer 1
Scheduling Optimization
method deq if (r[1] != invalid);r[1] <= invalid;return r[1];
method kill(tag);if (r[0].tag == tag)
r[0] <= invalid;
20July 25, 2016
deq
kill
Ehr rr w
Buffer 1
r w
For more information about EHRs, see “The Ephemeral History Register: Flexible Scheduling for Rule-Based Designs” by Daniel Rosenband
Processor Interface
21July 25, 2016
mkProc External Interrupt
Control/Configuration
Get Verification Packet
MMIO Req/Resp
HTIF ToHost/FromHost
Main Memory Req/Resp
Processor Devices, Accelerators, and Infrastructure
DebuggingMethods
Connectal InfrastructurePCIe FPGA Boards
22July 25, 2016
VC707
FPGA
RV64G BSV
PCIe
Connectal implements this connection
DRAM
HTIF/Fesvr
Spike Tandem
Verification
Host Computer
Test Program Loading
Device Emulation
Connectal InfrastructureSimulation
23July 25, 2016
HTIF/Fesvr
Spike Tandem
Verification
Host Computer
Test Program Loading
Device Emulation
Host Computer
Bluesim
RV64G BSV
Connectal implements this connection
Unix Socket
DRAM
Connectal InfrastructureZynq Chips
24July 25, 2016
HTIF/Fesvr
Spike Tandem
Verification
Zynq ARM core
Test Program Loading
Device Emulation
ARM core running same code as host in previous slide
AxiZynq FPGA Fabric
RV64G BSV
DRAM
Tandem Verification
Run the same program on two RISC-V implementations at oncen Generate verification packets at commit stagen Use non-deterministic information from the
implementation under test for synchronizationn Compare results
25July 25, 2016
Riscy Proc SpikeVerification Packets
pc, instruction, data, exceptions, etc.
Synchronize
Simulate
Compare
Commit Stage
Project StatusAvailable now: n Riscy Processor Libraryn Riscy BSV Utility Libraryn Example multicycle processorsn https://github.com/csail-csg/riscy
Coming soon:n High-performance In-Order processorn Out-of-Order processor and modulesn Multicore processors
Planned work:n Formal specificationsn Proofs for modulesn Proofs for processors
26July 25, 2016
Scheduling OptimizationIn Processor Design Flow
Choosing Schedulesn Typically close to reverse pipeline order, but
very dependent on microarchitectureAdapting modules to schedulen Registers that prevent desired schedule
should be replaced with EHRsn EHRs add bypass paths between rules
without breaking atomicity
29July 25, 2016
For more information about EHRs, see “The Ephemeral History Register: Flexible Scheduling for Rule-Based Designs” by Daniel Rosenband
Scheduling OptimizationConsequences of Schedule Choice
Location of concurrency logicn In the Move and Kill example, which buffer is in charge
of killing moving instructions?n One location may be more efficient than another
Existence of bypassing logic to skip latencyn Scheduling can allow some redirecting rules to update
the fetch pc in the same cycle while other redirecting rules update the fetch pc for the next cycle
Priority between consumers of shared resourcesn Scheduling can determine priority for ports in an
arbiters
30July 25, 2016
Scheduling OptimizationOut-of-Order Execution Example
31July 25, 2016
Busy Table
ReservationStation
Finish
Insert
RegFile
Execute
To functional unit
From functional unit
Ordering of Insert and Finish determines where bypass path is implemented
Ordering of Insert and Execute determines if an instruction can be executed in the same cycle it inserts the reservation station
Ordering of Execute and Finish determines if an instruction can be executed in the same cycle its registers become ready
Connecting Modules
32July 25, 2016
Buffer1
Increment
Move value Buffer 2deq enq
inc inc
The implementations of these two buffers are not independent in typical HDLs. This prevents modular design/refinement.
+0 +1+1 +2
Buffer 2 enq behavior
Buffer 1 deqbehavior
don’t increment incrementdon’t increment
increment