+ All Categories
Home > Documents > HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance...

HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance...

Date post: 27-Sep-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
18
HAsim: FPGA-Based Micro-Architecture Simulator Michael Adler Michael Pellauer Kermin E. Fleming* Angshuman Parashar Joel Emer *MIT
Transcript
Page 1: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

HAsim: FPGA-Based Micro-Architecture Simulator

Michael Adler Michael Pellauer

Kermin E. Fleming* Angshuman Parashar

Joel Emer

*MIT

Page 2: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

HAsim Is a Timing Model – Not RTL!

•  Performance models are: •  Highly parallel, but not easily vectorizable •  Pipelineable •  Full of communication channels

•  Programmed like a software timing model •  FPGA is just a highly parallel execution engine •  FPGA cycle != Model cycle •  FPGA simulation will be faster than software if: •  Parallelism can overcome the ~40x clock difference •  I/O bandwidth is sufficient

1

Page 3: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

Fast, Accurate or Now?

2

Accuracy

Development Time

Model Speed

Page 4: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

FPGA Picture is Different

3

Accuracy

Development Time

Model Speed

Page 5: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

Reducing Development Time: Managing Complexity

•  Programming Language (Bluespec) •  Timing model infrastructure

–  Reusable functional model –  Inter-module communication –  Tracking simulated time

•  Hybrid hardware / software models –  GEM5 for:

• Checkpoints •  Loading •  Functional memory management • Emulating difficult instructions

4

Development Time

Page 6: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

STDIO on General Purpose Machines

     

FILE  *f  =  fopen(path,  “w”);  const  char  *name  =  “Kenneth”;  fprintf(f,  “%s,  what  is  the  frequency?\n”,  name);  

5

Page 7: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

I/O In Hardware Description Languages (System Verilog)

     

Integer  f  =  fopen(path,  “w”);  string  name  =  “Kenneth”;  fwrite(f,  “%s,  what  is  the  frequency?\n”,  name);  

6

Page 8: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

Nothing Comes from Nothing

FPGAs have: •  No standard physical device •  No standard device model •  No standard system interface •  No standard API

7

Page 9: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

What Makes Hardware General Purpose?

•  The software –  Compilers and library APIs make code “universal” –  Hardware standards (ACPI, PCIe) mostly make OS

development and compiler writing easier. Little impact on user programs.

–  ISA matters if you want to avoid recompiling. ISA is part of the software API, along with standard libraries.

8

Page 10: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

LEAP Platform

RRR  

Pla&orm  Interface  

STDIO  Scratchpad  Memory  

Control  

Timing  Par<<on  

Func<onal  Par<<on  

Remote  Memory   Channel  

FPGA  Physical  Pla&orm  

Exe  Decode  Fetch  

RRR  

Channel  

So'ware  Physical  Pla&orm  

Virtual  Pla2orm  

Control  

SoBware  Services  

Streams  Memory  State  Emulate  

Virtual  Pla2orm  

FPGA   So'ware  

Page 11: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

Reducing Model Complexity: Shared Functional Model

10

ITranslate

Fetch

DTranslate

Memory

Local Commit

Global Commit

Decode

Execute

Functional Pipeline

Functional State

•  Similar philosophy to GEM5 or Asim: –  Single ISA functional model

implementation –  Functional machine state is

completely managed –  Timing models can be ISA-

independent

•  Each functional pipeline stage behaves like a request/response FIFO

ISPASS 2008 Paper: Quick Performance Models Quickly: Timing-Directed Simulation on FPGAs

Page 12: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

Timing Model

11

ITranslate

Fetch

DTranslate

Memory

Local Commit

Global Commit

Decode

Execute

Functional Pipeline

Functional State

IP

Next IP

•  Timing & functional models communicate state using tokens

•  Minimal timing model: –  Only state is IP –  Drives a single

token at a time

Timing Pipeline

Page 13: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

Pipeline Parallelism

12

ITranslate

Fetch

DTranslate

Memory

Local Commit

Global Commit

Decode

Execute

Functional Pipeline

Functional State

IPs

Next IPs

•  Model of a pipelined design naturally runs pipelined on an FPGA

•  Detailed model of a pipelined design runs faster than a trivial, unpipelined model!

Page 14: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

13

Managing Time: A-Ports and Soft Connections

FPGA cycles != simulated cycles:

–  We are building a timing model, NOT a prototype –  1:n cycle mapping would force us to slow the

timing clock to the longest operation, even if it is infrequent

–  1:n would force us either to fit an entire design on the FPGA or synchronize clock domains

Page 15: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

14

Option #1: Global Controller [rejected]

Central controller advances cycle when all modules are ready •  Improvement: slowest possible cycle no longer dictates

throughput •  However:

–  Place & route becomes difficult –  Long signal to global controller is on the critical path

FET DEC EXE MEM WB

Controller

curCC

Page 16: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

15

Option #2: A-Ports

•  Extension of Asim ports •  FIFO with user-specified latency and capacity •  Manage model time by guaranteeing exactly one

message per cycle through every port

FET DEC EXE MEM WB 1 1

1 1 0

2

•  Beginning of model cycle: read all input ports •  End of model cycle: write all output ports

ISFPGA 2008 Paper: A-Ports: An Efficient Abstraction for Cycle-Accurate Performance Models on FPGAs

Page 17: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

Hybrid Modeling: Software Instruction Emulation

16

FPG

A Sof

twar

e

Time

Execute

Emulation Server

GEM5 Functional Instruction Simulator

Memory Server

Functional Cache

Execute

Emulation Server

Sync Registers Sync

Reg

iste

rs

RRR Layer

Emulate Instruction Em

ulat

ion

Don

e

……

Ack

Page 18: HAsim: FPGA-Based Micro-Architecture SimulatorHAsim Is a Timing Model – Not RTL! • Performance models are: • Highly parallel, but not easily vectorizable • Pipelineable •

HAsim / LEAP Open Source

Redmine site with source and papers:

http://asim.csail.mit.edu/

17


Recommended