Howie Mao, Jerry Zhao
UC Berkeley
{zhemao,jzh}@berkeley.edu
Chipyard Basics
Motivation
Berkeley Architecture Research has developed and open-sourced:
Goal:
Make it easy for small teams to design, integrate, simulate, and tape-out a custom SoC
2
Chisel
FIRRTL
RISC-V
Rocket Core
BOOM Core
TileLink
AcceleratorsCaches
Peripherals
Diplomacy
Configuration
System
FireSim
HAMMER
Chipyard
Chipyard
Tooling
Chisel
FIRRTL
RISC-V
Rocket Chip
Generators
Rocket Core BOOM Core
TileLinkAccelerators
Caches Peripherals
Diplomacy
Configuration
System
Flows
FireSim
HAMMER
Software RTL
Simulation
3
Chipyard SW RTL Simulation
4
Custom SoC
Configuration
RTL Generators
RISC-V
Cores
Multi-level
Caches
Custom
VerilogPeripheralsAccelerators
RTL Build Process
Transforms for SW Sim
FIRRTL IR
Behavioral
Verilog
Software RTL Simulation
VCS Verilator
Chipyard targeting FireSim
5
Custom SoC
Configuration
RTL Generators
RISC-V
Cores
Multi-level
Caches
Custom
VerilogPeripheralsAccelerators
RTL Build Process
Transforms for FireSim
FIRRTL IR
FireSim
Verilog
FireSim FPGA-Accelerated Simulation
Simulation Debugging Networking
Chipyard VLSI Flow
6
Custom SoC
Configuration
RTL Generators
RISC-V
Cores
Multi-level
Caches
Custom
VerilogPeripheralsAccelerators
RTL Build Process
Transforms for VLSI
FIRRTL IR
VLSI Verilog
Automated VLSI Flow
HammerTech-
plugins
Tool-
plugins
Chipyard Unified Flows
7
Custom SoC
Configuration
RTL Generators
RISC-V
Cores
Multi-level
Caches
Custom
VerilogPeripheralsAccelerators
RTL Build Process
Transforms for FireSim
FIRRTL IR
FireSim
Verilog
Transforms for SW Sim Transforms for VLSI
Behavioral
VerilogVLSI Verilog
Automated VLSI Flow
HammerTech-
plugins
Tool-
plugins
Software RTL Simulation
VCS Verilator
FireSim FPGA-Accelerated Simulation
Simulation Debugging Networking
Tutorial Roadmap
Custom SoC
Configuration
RTL Generators
RISC-V
Cores
Multi-level
Caches
Custom
VerilogPeripheralsAccelerators
Software RTL Simulation
VCS Verilator
FireSim FPGA-Accelerated Simulation
Simulation Debugging Networking
Automated VLSI Flow
HammerTech-
plugins
Tool-
plugins
RTL Build Process
FIRRTL
TransformsFIRRTL IR Verilog
FireMarshal
Bare-metal &
Linux
Custom
Workload
QEMU & Spike
Chipyard Tooling
Chisel
• Chisel – Hardware Construction Language built on Scala
• What Chisel IS NOT:• NOT Scala-to-gates• NOT HLS• NOT tool-oriented language
• What Chisel IS:• Productive language for generating hardware• Leverage OOP/Functional programming paradigms• Enables design of parameterized generators• Designer-friendly: low barrier-to-entry, high reward• Backwards-compatible: integrates with Verilog black-boxes
10
Chisel FIRRTL Verilog VLSI
Chisel VLSI
Chisel Example
// 3-point moving average implemented in the style of a FIR filterclass MovingAverage3 extends Module {
val io = IO(new Bundle {
val in = Input(UInt(32.W))val out = Output(UInt(32.W))
})
val z1 = RegNext(io.in)
val z2 = RegNext(z1)
io.out := io.in + z1 + z2}
11
z1
32 32
z2
+
× × ×
32
+ +
1 1 1
in
out
Chisel Example
// Generalized FIR filter parameterized by coefficients
class FirFilter(bitWidth: Int, coeffs: Seq[Int]) extends Module {
val io = IO(new Bundle {
val in = Input(UInt(bitWidth.W))
val out = Output(UInt(bitWidth.W))
})
val zs = Wire(Vec(coeffs.length, UInt(bitWidth.W)))
zs(0) := io.in
for (i <- 1 until coeffs.length) {
zs(i) := RegNext(zs(i-1))
}
val products = zs zip coeffs map {
case (z, c) => z * c.U
}
io.out := products.reduce(_ + _)
}
12
z1
W W
z2
+
× × ×
W
+ +
c0 c1 c2
in
out
W
×
+
zN-1
cN-1
Chisel Example
// Basic implementation
val basic3Filter = Module(new MovingAverage3)
// Parameterized implementation
val better3Filter = Module(new FirFilter(32, Seq(1, 1, 1)))
// Generator is reusable
val delayFilter = Module(new FirFilter(8, Seq(0, 1)))
val triangleFilter = Module(new FirFilter(8, Seq(1, 2, 3, 2, 1)))
13
FIRRTL – LLVM for Hardware
14
FIRRTL emits tool-friendly, synthesizable Verilog
C/C++
Rust
LLVM IR
LLVM PassManager x86 assembly
Dead code
elimination
Statistics
collectionOptimization
ARM assembly
Chisel
Verilog
FIRRTL IR
FIRRTL Passes Verilog for
SW SimDead
expression
elimination
Statistics
collection
Netlist
manipulation Verilog for
FPGA Sim
Rocket Chip Generators
What is Rocket Chip?
• A highly parameterizable and modular SoC generator• Replace default Rocket core w/ your own core• Add your own coprocessor
• Add your own SoC IP to uncore
• A library of reusable SoC components• Memory protocol converters
• Arbiters and Crossbar generators• Clock-crossings and asynchronous queues
• The largest open-source Chisel codebase
• Developed at Berkeley, now maintained by many• SiFive, ChipsAlliance, Berkeley
16
Generating Varied SoCs
In academia: UCB Hurricane-1In industry: SiFive Freedom E310
17
Used in Many Tapeouts
18
Structure of a Rocket Chip SoC
Tiles: unit of replication for a core
• CPU
• L1 Caches
• Page-table walker
L2 banks:
• Receive memory requests
FrontBus:
• Connects to DMA devices
ControlBus:
• Connects to core-complex devices
PeripheryBus:
• Connects to other devices
SystemBus:
• Ties everything together
19
The Rocket In-Order Core
• First open-source RISC-V CPU
• In-order, single-issue RV64GC core
• Floating-point via Berkeley hardfloatlibrary
• RISC-V Compressed
• Physical Memory Protection (PMP) standard
• Supervisor ISA and Virtual Memory
20
• Boots Linux
• Supports Rocket Chip Coprocessor (RoCC) interface
• L1 I$ and D$
• Caches can be configured as scratchpads
BOOM: The Berkeley Out-of-Order Machine
• Superscalar RISC-V OoO core
• Fully integrated in Rocket Chip ecosystem
• Open-source
• Described in Chisel
• Parameterizable generator
• Taped-out (BROOM at HC18)
• Full RV64GC ISA support• FP, RVC, Atomics, PMPs, VM, Breakpoints,
RoCC
• Runs real OS’s, software
• Drop-in replacement for Rocket
BOOMTile
BOOM
21
RoCC Accelerators
• RoCC: Rocket Chip Coprocessor
• Execute custom RISC-V instructions for a custom extension
• Examples of RoCC accelerators• Vector accelerators
• Memcpy accelerator
• Machine-learning accelerators
• Java GC accelerator
Tile
BOOM/Rocket
L1I$ L1D$
PTWTLBsDecoupled
RoCC
Accelerator
L2
SystemBus
Core
ComplexPeripherals
inst
wb
22
L2 Cache and Memory System
• Multi-bank shared L2• SiFive’s open-source IP
• Fully coherent
• Configurable size, associativity
• Supports atomics, prefetch hints
• Non-caching L2 Broadcast Hub• Coherence w/o caching
• Bufferless design
• Multi-channel memory system• Conversion to AXI4 for compatible
DRAM controllers
23
Core Complex Devices
• BootROM• First-stage bootloader
• DeviceTree
• PLIC
• CLINT• Software interrupts
• Timer interrupts
• Debug Unit• DMI
• JTAG
24
Other Chipyard Blocks
• Hardfloat: Parameterized Chisel generators for hardware floating-point units
• IceNet: Custom NIC for FireSim simulations
• SiFive-Blocks: Open-sourced Chisel peripherals• GPIO, SPI, UART, etc.
• TestchipIP: Berkeley utilities for chip testing/bringup• Tethered serial interface
• Simulated block device
• Hwacha: Decoupled vector-fetch RoCC accelerator
• SHA3: Educational SHA3 RoCC accelerator
25
TileLink Interconnect
• Free and open chip-scale interconnect standard
• Supports multiprocessors, coprocessors, accelerators, DMA, peripherals, etc.
• Provides a physically addressed, shared-memory system
• Supports cache-coherent shared memory, MOESI-equivalent protocol
• Verifiable deadlock freedom for conforming SoCs
26
TileLink Interconnect
• Three different protocol levels with increasing complexity• TL-UL (Uncached Lightweight)
• TL-UH (Uncached Heavyweight)
• TL-C (Cached)
• Rocket Chip provides library of reusable TileLink widgets• Conversion to/from AXI4, AHB, APB
• Conversion among TL-UL, TL-UH, TL-C
• Crossbar generator
• Width / logical size converters
• TLMonitor conformance checker
27
Integration
28
RISC-V
Cores
Multi-level
Caches
Custom
VerilogPeripherals
Accelerators
Software RTL
Simulator
Custom SoC
Configuration
RISC-V
Cores
Multi-level
Caches
Custom
VerilogPeripherals
Accelerators
FireSim FPGA
Image
Custom SoC
Configuration
RISC-V
Cores
Multi-level
Caches
Custom
VerilogPeripherals
Accelerators
GDS
Custom SoC
Configuration
Diplomacy
Problem: Interconnects are difficult to parameterize correctly
• Complex interconnect graph with many nodes
• Nodes are independently parameterized
Diplomacy: Framework for negotiating parameters between Chisel generators
• Graphical abstraction of interconnectivity
• Diplomatic lazy modules follow two-phase elaboration• Phase one: nodes exchange configuration information with each other and decide final
parameters
• Phase two: Chisel RTL elaborates using calculated parameters
• Used extensively by RocketChip TileLink generators
29
Diplomacy Examples
Diplomatic parameters
• Type and size of supported operations
• Physical memory attributes – modifiability, executability, cacheability
• Ordering requirements on operations (ex: FIFO)
• Presence and widths of fields in wire bundles (ex: source ID bits)
Useful applications:
• Automatically insert TLMonitor protocol correctness checkers
• Discover AtomicAutomata topology violations
30
Diplomacy Example
Source[0,1) Source[0,2) Source[0,4)
Address[0x0, 0x1000)
Address[0x8000, 0xA000) Address[0xA000, 0xB000)
Client
Manager
Crossbar
L1D$L1I$
BootROM
L2 TL to AXI
AXI to TL
31
Diplomacy Example
[0,1) [0,2) [0,4)Client
Manager
Crossbar
[0,4) [0,4)
[0,8) [0,8)
L1D$L1I$ AXI to TL
BootROM
L2 TL to AXI
32
Diplomacy Example
[0x0, 0xB000)
[0x8000, 0xA000) [0xA000, 0xB000)
Client
Manager
Crossbar
[0x8000, 0xB000)
[0x8000, 0xB000)[0x0, 0x1000)
[0x0, 0xB000)
L1D$L1I$ AXI to TL
BootROM
L2 TL to AXI
33
Diplomacy-generated Graph
34
Tile
Front Bus
System Bus
Control Bus
L2 InclusiveCache
Memory Bus
Rocket Chip Configuration
class MyCustomConfig extends Config(
new WithExtMemSize((1<<30) * 2L) ++
new WithBlockDevice ++
new WithGPIO ++
new WithBootROM ++
new hwacha.DefaultHwachaConfig ++
new WithInclusiveCache(capacityKB=1024) ++
new boom.common.WithLargeBooms ++
new boom.system.WithNBoomCores(3) ++
new WithNormalBoomRocketTop ++
new rocketchip.system.BaseConfig)
TestHarness
Top
Tile 0
BOOM
L1I$ L1D$
3-w BOOM
SysBus
MemBus
BootROM
L2Hwacha
GPIOs
Tile 1
L1I$ L1D$
BOOM3-w BOOM Hwacha
SimBlockDevice SimAXIMem
Tile 2
L1I$ L1D$
BOOM3-w BOOM Hwacha
35
Rocket Chip Configuration
class MyCustomConfig extends Config(
new WithExtMemSize((1<<30) * 2L) ++
new WithBlockDevice ++
new WithGPIO ++
new WithBootROM ++
new hwacha.DefaultHwachaConfig ++
new WithInclusiveCache(capacityKB=1024) ++
new boom.common.WithLargeBooms ++
new boom.system.WithNBoomCores(2) ++
new rocketchip.subsystem.WithNBigCores(1)++
new WithNormalBoomRocketTop ++
new rocketchip.system.BaseConfig)
TestHarness
Top
Tile 0
BOOM
L1I$ L1D$
3-w BOOM
SysBus
MemBus
BootROM
L2Hwacha
GPIOs
Tile 1
L1I$ L1D$
BOOM3-w BOOM Hwacha
SimBlockDevice SimAXIMem
Tile 2
L1I$ L1D$
BOOMRocket Hwacha
36
Rocket Chip Configuration
class MyCustomConfig extends Config(
new WithExtMemSize((1<<30) * 2L) ++
new WithBlockDevice ++
new WithGPIO ++
new WithBootROM ++
new WithMultiRoCCConvAccel(2) ++
new WithMultiRoCCSha3(1) ++
new WithMultiRoCCHwacha(0) ++
new WithInclusiveCache(capacityKB=1024) ++
new boom.common.WithLargeBooms ++
new boom.system.WithNBoomCores(2) ++
new rocketchip.subsystem.WithNBigCores(1)++
new WithNormalBoomRocketTop ++
new rocketchip.system.BaseConfig)
TestHarness
Top
Tile 0
BOOM
L1I$ L1D$
3-w BOOM
SysBus
MemBus
BootROM
L2Hwacha
GPIOs
Tile 1
L1I$ L1D$
BOOM3-w BOOM SHA3
SimBlockDevice SimAXIMem
Tile 2
L1I$ L1D$
BOOMRocket ConvNN
37
Rocket Chip Configuration
class MyCustomConfig extends Config(
new WithExtMemSize((1<<30) * 2L) ++
new WithBlockDevice ++
new WithGPIO ++
new WithBootROM ++
new WithMultiRoCCConvAccel(2) ++
new WithMultiRoCCSha3(1) ++
new WithMultiRoCCHwacha(0) ++
new WithInclusiveCache(capacityKB=1024) ++
new boom.common.WithLargeBooms ++
new boom.system.WithNBoomCores(2) ++
new rocketchip.subsystem.WithRV32 ++
new rocketchip.subsystem.WithNBigCores(1)++
new WithNormalBoomRocketTop ++
new rocketchip.system.BaseConfig)
TestHarness
Top
Tile 0
BOOM
L1I$ L1D$
3-w BOOM
SysBus
MemBus
BootROM
L2Hwacha
GPIOs
Tile 1
L1I$ L1D$
BOOM3-w BOOM SHA3
SimBlockDevice SimAXIMem
Tile 2
L1I$ L1D$
BOOMRV32Rocket ConvNN
38
Rocket Chip Configuration
class MyCustomConfig extends Config(
new WithExtMemSize((1<<30) * 2L) ++
new WithBlockDevice ++
new WithGPIO ++
new WithJtagDTM ++
new WithBootROM ++
new WithMultiRoCCConvAccel(2) ++
new WithMultiRoCCSha3(1) ++
new WithMultiRoCCHwacha(0) ++
new WithInclusiveCache(capacityKB=1024) ++
new boom.common.WithLargeBooms ++
new boom.system.WithNBoomCores(2) ++
new rocketchip.subsystem.WithRV32 ++
new rocketchip.subsystem.WithNBigCores(1)++
new WithNormalBoomRocketTop ++
new rocketchip.system.BaseConfig)
TestHarness
Top
Tile 0
BOOM
L1I$ L1D$
3-w BOOM
SysBus
MemBus
BootROM
L2Hwacha
GPIOs
Tile 1
L1I$ L1D$
BOOM3-w BOOM SHA3
SimBlockDevice SimAXIMem
Tile 2
L1I$ L1D$
BOOMRV32Rocket ConvNN
JTAG
39
Rocket Chip Configuration
class MyCustomConfig extends Config(
new WithExtMemSize((1<<30) * 2L) ++
new WithBlockDevice ++
new WithGPIO ++
new WithJtagDTM ++
new WithBootROM ++
new WithRenumberHarts(rocketFirst=true) ++
new WithMultiRoCCConvAccel(2) ++
new WithMultiRoCCSha3(1) ++
new WithMultiRoCCHwacha(0) ++
new WithInclusiveCache(capacityKB=1024) ++
new boom.common.WithLargeBooms ++
new boom.system.WithNBoomCores(2) ++
new rocketchip.subsystem.WithRV32 ++
new rocketchip.subsystem.WithNBigCores(1)++
new WithNormalBoomRocketTop ++
new rocketchip.system.BaseConfig)
TestHarness
Top
Tile 1
BOOM
L1I$ L1D$
3-w BOOM
SysBus
MemBus
BootROM
L2Hwacha
GPIOs
Tile 2
L1I$ L1D$
BOOM3-w BOOM SHA3
SimBlockDevice SimAXIMem
Tile 0
L1I$ L1D$
BOOMRV32Rocket ConvNN
JTAG
40
Rocket Chip Configuration
class MyCustomConfig extends Config(
new WithExtMemSize((1<<30) * 2L) ++
new WithBlockDevice ++
new WithGPIO ++
new WithJtagDTM ++
new WithBootROM ++
new WithRenumberHarts(rocketFirst=true) ++
new WithRationalBoomTiles ++
new WithRationalRocketTiles ++
new WithMultiRoCCConvAccel(2) ++
new WithMultiRoCCSha3(1) ++
new WithMultiRoCCHwacha(0) ++
new WithInclusiveCache(capacityKB=1024) ++
new boom.common.WithLargeBooms ++
new boom.system.WithNBoomCores(2) ++
new rocketchip.subsystem.WithRV32 ++
new rocketchip.subsystem.WithNBigCores(1)++
new WithNormalBoomRocketTop ++
new rocketchip.system.BaseConfig)
TestHarness
Top
Tile 1
BOOM
L1I$ L1D$
3-w BOOM
SysBus
MemBus
BootROM
L2Hwacha
GPIOs
Tile 2
L1I$ L1D$
BOOM3-w BOOM SHA3
SimBlockDevice SimAXIMem
Tile 0
L1I$ L1D$
BOOMRV32Rocket ConvNN
JTAG
clk_1
clk_2 clk_0
41
Chipyard is Community-Friendly
Documentation:
• https://chipyard.readthedocs.io/en/dev/
• 85 pages
• Documents components, flows
• Links to sub-project documentation
• Most of today’s tutorial content is covered there
Continuous Integration:
• Cloud-hosted
• https://circleci.com/gh/ucb-bar/chipyard/tree/master
42
Chipyard is Research-Friendly
• Add new accelerators/custom instructions
• Modify OS/driver/software
• Perform design-space exploration across many parameters
• Test in software and FPGA-sim before tape-out
Stay-tuned for Chipyard-based research from Berkeley
• New chips
• New accelerators
43
High-level questions?
• Next is a hands-on tutorial led by Abe Gonzalez
44