HAsim: FPGA-Based Micro-Architecture Simulator
Michael Adler Michael Pellauer
Kermin E. Fleming* Angshuman Parashar
Joel Emer
*MIT
HAsim Is a Timing Model – Not RTL!
• Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable • Full of communication channels
• Programmed like a software timing model • FPGA is just a highly parallel execution engine • FPGA cycle != Model cycle • FPGA simulation will be faster than software if: • Parallelism can overcome the ~40x clock difference • I/O bandwidth is sufficient
1
Fast, Accurate or Now?
2
Accuracy
Development Time
Model Speed
FPGA Picture is Different
3
Accuracy
Development Time
Model Speed
Reducing Development Time: Managing Complexity
• Programming Language (Bluespec) • Timing model infrastructure
– Reusable functional model – Inter-module communication – Tracking simulated time
• Hybrid hardware / software models – GEM5 for:
• Checkpoints • Loading • Functional memory management • Emulating difficult instructions
4
Development Time
STDIO on General Purpose Machines
FILE *f = fopen(path, “w”); const char *name = “Kenneth”; fprintf(f, “%s, what is the frequency?\n”, name);
5
I/O In Hardware Description Languages (System Verilog)
Integer f = fopen(path, “w”); string name = “Kenneth”; fwrite(f, “%s, what is the frequency?\n”, name);
6
Nothing Comes from Nothing
FPGAs have: • No standard physical device • No standard device model • No standard system interface • No standard API
7
What Makes Hardware General Purpose?
• The software – Compilers and library APIs make code “universal” – Hardware standards (ACPI, PCIe) mostly make OS
development and compiler writing easier. Little impact on user programs.
– ISA matters if you want to avoid recompiling. ISA is part of the software API, along with standard libraries.
8
LEAP Platform
RRR
Pla&orm Interface
STDIO Scratchpad Memory
Control
Timing Par<<on
Func<onal Par<<on
Remote Memory Channel
FPGA Physical Pla&orm
Exe Decode Fetch
RRR
Channel
So'ware Physical Pla&orm
Virtual Pla2orm
Control
SoBware Services
Streams Memory State Emulate
Virtual Pla2orm
FPGA So'ware
Reducing Model Complexity: Shared Functional Model
10
ITranslate
Fetch
DTranslate
Memory
Local Commit
Global Commit
Decode
Execute
Functional Pipeline
Functional State
• Similar philosophy to GEM5 or Asim: – Single ISA functional model
implementation – Functional machine state is
completely managed – Timing models can be ISA-
independent
• Each functional pipeline stage behaves like a request/response FIFO
ISPASS 2008 Paper: Quick Performance Models Quickly: Timing-Directed Simulation on FPGAs
Timing Model
11
ITranslate
Fetch
DTranslate
Memory
Local Commit
Global Commit
Decode
Execute
Functional Pipeline
Functional State
IP
Next IP
• Timing & functional models communicate state using tokens
• Minimal timing model: – Only state is IP – Drives a single
token at a time
Timing Pipeline
Pipeline Parallelism
12
ITranslate
Fetch
DTranslate
Memory
Local Commit
Global Commit
Decode
Execute
Functional Pipeline
Functional State
IPs
Next IPs
• Model of a pipelined design naturally runs pipelined on an FPGA
• Detailed model of a pipelined design runs faster than a trivial, unpipelined model!
13
Managing Time: A-Ports and Soft Connections
FPGA cycles != simulated cycles:
– We are building a timing model, NOT a prototype – 1:n cycle mapping would force us to slow the
timing clock to the longest operation, even if it is infrequent
– 1:n would force us either to fit an entire design on the FPGA or synchronize clock domains
14
Option #1: Global Controller [rejected]
Central controller advances cycle when all modules are ready • Improvement: slowest possible cycle no longer dictates
throughput • However:
– Place & route becomes difficult – Long signal to global controller is on the critical path
FET DEC EXE MEM WB
Controller
curCC
15
Option #2: A-Ports
• Extension of Asim ports • FIFO with user-specified latency and capacity • Manage model time by guaranteeing exactly one
message per cycle through every port
FET DEC EXE MEM WB 1 1
1 1 0
2
• Beginning of model cycle: read all input ports • End of model cycle: write all output ports
ISFPGA 2008 Paper: A-Ports: An Efficient Abstraction for Cycle-Accurate Performance Models on FPGAs
Hybrid Modeling: Software Instruction Emulation
16
FPG
A Sof
twar
e
Time
Execute
Emulation Server
GEM5 Functional Instruction Simulator
Memory Server
Functional Cache
Execute
Emulation Server
Sync Registers Sync
Reg
iste
rs
RRR Layer
Emulate Instruction Em
ulat
ion
Don
e
……
Ack
HAsim / LEAP Open Source
Redmine site with source and papers:
http://asim.csail.mit.edu/
17