+ All Categories
Home > Documents > Transactor-based debugging of massively parallel … · Transactor-based debugging of massively...

Transactor-based debugging of massively parallel … · Transactor-based debugging of massively...

Date post: 07-Jul-2018
Category:
Upload: vuongbao
View: 228 times
Download: 0 times
Share this document with a friend
20
Markus Blocherer, Srinivas Boppu, Vahid Lari, Frank Hannig, Jürgen Teich Hardware/Software Co-Design University of Erlangen-Nuremberg Transactor-based debugging of massively parallel processor array architectures 1st International Workshop on Multicore Application Debugging (MAD 2013), November 14-15, 2013 Germany
Transcript

Markus Blocherer, Srinivas Boppu, Vahid Lari, Frank Hannig, Jürgen Teich

Hardware/Software Co-Design

University of Erlangen-Nuremberg

Transactor-based debugging of

massively parallel processor array

architectures

1st International Workshop on

Multicore Application Debugging (MAD 2013),

November 14-15, 2013

Germany

Agenda

Slide 2

Motivation

Invasive Computing

Hardware Debugging

Transactor-based Prototyping

Conclusions

Slide 3

Motivation

Steady increase in the

application complexity

Customization and

heterogeneity are the key

success for future

performance gains

Steady increase in the

number of cores on a chip

TCPA

CPU CPU

CPU CPU

Memory

CPU

i-Core CPU

Memory

CPU

CPU CPU

CPU CPU

Memory

MemoryI/O

CPU

i-Core CPU

Memory

CPU

Memory TCPA

CPU CPU

CPU CPU

NoC

Router

NoC

RouterNoC

Router

NoC

Router

NoC

Router

NoC

Router

NoC

RouterNoC

Router

NoC

Router

Memory

• A resource-aware computing paradigm− Each application may use available computing resources in 3 phases:• Exploring and claiming them (invade)

• Configuring them for parallel computing (infect)

• Releasing them (retreat)

• Support for resource-awareness at

various levels − Application level

− Compiler level

− Run-time system level

− Architecture level

• Architecture consists of different

compute tiles− RISC CPU tiles

− RISC CPUs with reconfigurable fabrics

− Programmable accelerators (TCPA)

tiled architecture

Invasive Computing

Slide 4

Challenge: Simultaneous development of different architecture

and software parts as well as their integration and validation

AHB bus

Conf. & Com.

Proc. (LEON3)Int. Ctr.

AP

B b

us

AHB/APB

Bridge

IM GC

AG

IM

GC

AG

IMGC

AG

IM

GC

AG

Configura

tion

Manager

I/O

Buffers

I/O

Buffers

I/O Buffers

I/O Buffers

/* code to be executed sequentially*/

...

val constraints = new AND();

constraints.add(new TypeConstraint(PEType.TCPA));

constraints.add(new PEquantity(4));

constraints.add(new Layout(LIN));

val claim = Claim.invade( constraints );

val ilet = (id:IncarnationID) => {

/* code to be executed in parallel */

...

};

claim.infect( ilet );

claim.retreat();

Run-time

system

Invasion on TCPAs

• Run-time system interaction with TCPAs

• Resource requests and releases

• Application configuration

• Input/output data streams

How do we prototype

TCPAs with tight

software/hardware

interactions?

Slide 5

InvasIC Prototyping Platform

Slide 6

• Synopsys FPGA-based

prototyping platform

− Up to 12 million ASIC gates

of capacity

− Tools for multi-FPGA

prototypes (Certify)

and RTL debug (Identify)

− UMRBus interface kit for

host workstation

− Transactor library for AMBA

to support

bus-protocol communication

− Portable hardwareDUT

FPGA-Based Hardware

ConnectorCamera

Sensor I/F

Host

Connector

OS

Run-time

Control

Display

Driver

DVI Extension

Typical HDL-based Development

Slide 7

HDL-Simulator (ModelSim)

Testbench (VHDL)I/O

Buffers

I/O

Buffers

I/O Buffers

I/O Buffers

DUT

HDL-Bridge-based Debugging

Slide 8

HDL-Simulator (ModelSim)

Testbench (VHDL)

Software Wrapper

Hardware Wrapper

I/O

Buffers

I/O

Buffers

I/O Buffers

I/O Buffers

DUT

Synopsys Transactor Library

Slide 9

• Library offers

UMRBus-based

transactors− AMBA

− UART

− GPIO

− …

• C++ and Tcl API

• Easy to integrate into

existing RTL designs

write () ahb_master

Read ()

API CAPIM

AH

B b

us

UMRBus read/write

initiator

call back () ahb_slave

call back ()

API CAPIM read/write

initiator

UMRBus

Evaluation

Slide 10

Performance Cycle

accuracy

Signal

observability

Intended use

HDL-Simulation slowest yes high hardware

development

HDL-Bridge slow yes medium hardware

debugging

AMBA-

Transactor

high no low integration and

extended testing

• Hardware developing and debugging requires cycle

accuracy and highly flexible possibilities to observe

individual signals

• For software developing and testing, the performance is a

key feature beside observability of registers

Now, the main video based

application (Edge detection)

tries to capture the remaining

PEs on the TCPA tile, while

satisfying the following

properties:

Guaranteed constant

throughput for a

1024x768 frame resolution

Dynamic adaptation of

quality of service

(Laplace or Sobel)

Test Application

Slide 11

A secondary application

pre-occupies a number of PEs

on the target TCPA-tile

AHB bus

Conf. & Com.

Proc. (LEON3)Int. Ctr.

AP

B b

us

AHB/APB

Bridge

IM GC

AG

IM

GC

AG

IMGC

AG

IM

GC

AG

Configura

tion

Manager

I/O

Buffers

I/O

Buffers

I/O Buffers

I/O Buffers

Rx TxDVI Extension Board

Hardware/Software Interactions

Slide12

AM

BA

AH

B T

ran

sact

or

AHB bus

Conf. & Com.

Proc. (Leon3)

Int.

Ctr.

AP

B b

us

AHB/APB

Bridge

IM IM

IMIM

Configura

tion

Manager

I/O

Buffers

I/O

Buffers

I/O Buffers

I/O Buffers

Rx TxDVI Extension Board

LEON3: An Invade Request for n PEsRequest an arbitrary number of PEs for a secondary application (n)

TCPA: Invasions on invasion controllers

LEON3: Respond the invasion request (n PEs)

LEON3: An Invade Request for 25 PEs

TCPA: Invasions on invasion controllers

Request 25 PEs for the edge detection application

If (2<m<9)

Load Sobel

1x3

configuration

If (8<m<25)Load Laplace

3x3 configuration

If (m==25)Load Laplace

5x5 configuration

Receive the number of invaded PEs (m) LEON3: Respond the invasion request (m )

Send configuration stream and start computation

TCPA: Application execution

Application termination and resource release request

Application Scenarios / Results

Slide 13

Experimental Setup

Slide 14

AHB Bus

LEON3

CORE: 1

LEON3

CORE: 2

LEON3

CORE: 0

LEON3

CORE: 3

static

RAM

Master

Transactor

• 1. Step

− Write data to the RAM

− measure data rate

• 2. Step

− Read data from RAM

− measure data rate

Master Transactor Data Rate

Slide 15

0,261 0,631

1,0052,466

3,487

6,666

9,138

13,388

17,724

20,798

23,174

0,331 0,744

1,562,907

4,584

6,132

7,344

8,576 8,98 9,18 9,458

0

5

10

15

20

25

128 256 512 1K 2K 4k 8K 16K 32K 64K 128K

MB

yte

s/s

ec

bytes

write read

Software Development

Slide 16

• GRMON

• General debug monitor for the LEON3 processor Read/write access to all system registers and memory

Built-in disassembler and trace buffer management

Downloading and execution of LEON applications

Breakpoint and watchpoint management

Support for USB, JTAG, RS232, PCI, and Ethernet debug links

Tcl interface (scripts, procedures, variables, loops etc.)

• Challenges

• Initial situation offered by GAISLER Bus-based MPSoC with up to 16 cores and only one GRMON

instance

• But, we need a GRMON instance to each tile

Each instance needs a separate connection medium to CHIPit

Synchronization between the tiles

GRMON Debugging

Slide 17

CPU CPU

CPU CPU

Memory

i-Core

CPU CPU

CPU CPU

Memory

i-Core

CPU CPU

CPU CPU

Memory

CPU CPU

CPU CPU

Memory

CPU CPU

CPU CPU

Memory

TCPA MemoryI/O

Memory TCPA

NoC

Router

NoC

RouterNoC

Router

NoC

Router

NoC

Router

NoC

Router

NoC

RouterNoC

Router

NoC

Router• Data transfer

I/O Tile

Direct to the tiles

• Debug

− Debug unit

− GAISLER (GRMON)

DEBUG DEBUG DEBUG

DEBUG DEBUG

DEBUG DEBUG DEBUG

DEBUG

Multiple Transactor-based Debugging

Slide 18

CPU CPU

CPU CPU

Memory

i-Core

CPU CPU

CPU CPU

Memory

i-Core

CPU CPU

CPU CPU

Memory

CPU CPU

CPU CPU

Memory

CPU CPU

CPU CPU

Memory

TCPA MemoryI/O

Memory TCPA

NoC

Router

NoC

RouterNoC

Router

NoC

Router

NoC

Router

NoC

Router

NoC

RouterNoC

Router

NoC

Router• Data transfer

I/O Tile

Direct to the tiles

• Debug

− AMBA Transactor

− GAISLER (GRMON)

Transactor Transactor Transactor

TransactorTransactorTransactor

Transactor Transactor Transactor

Conclusions

Silde 19

• HDL-Bridge-based debugging enables efficient and

precise hardware development on multiple FPGAs

• AHB transactor interface eased connectivity and control

over FPGA-based prototype

• Transactor-based debugging offers fast and scalable

hardware-software interaction of heterogeneous MPSoC

• Our FPGA-based prototyping approach is feasible for

MPSoC validation and demonstration

Thank you for your attention!

Slide 20

Transactor-based debugging of

massively parallel processor array

architectures

ContactMarkus Blocherer

Hardware/Software Co-Design

Universität Erlangen-Nürnberg

Cauerstraße 11, 91058 Erlangen, Germany

Email: [email protected]

www.invasive-computing.org


Recommended