Interconnect Architectures

Interconnect Architectures for Modulo-Scheduled

Coarse-Grained Reconfigurable Arrays

Ahmed Hassan Mohammed

1

Outline

Introduction Reconfigurable Arrays

Device Architecture Mapping Technology

Proposed Architectures Experimental Results Overall Results References

2

Outline




3

Introduction4

Introduction

ASIC

Reconfigurable

Arraysµ

Processors

Flexibility

Perf

orm

ance

5

Platforms

Outline




6

Reconfigurable Arrays

Fine-grained Coarse-grained

Purpose

Basic Unit

level

Re-configurability

Performance

General purpose

LUT

bit-level

High overhead

Low

Application Specific

ALU

word-level

Reduced overhead

High

7

Reconfigurable Arrays

Fined-grained

Coarse-grained

Purpose

Basic Unit

level

Re-configurability

Performance

General purpose

LUT

bit-level

High overhead

Low

Application Specific

ALU

word-level

Reduced overhead

High

8

Outline




9

Device Architecture10

Outline




11

Mapping Technology

Dataflow graph Architecture

12

Mapping Technology

Iteration 1 Iteration 2 Iteration 3

13

Outline




14

Proposed Architectures

Closest Topology

Clique Topology

Directional Topology

Heterogeneous Topology

15


Finput = 3

Example16

Finput : Number of possible inputs for a CFU


Closest Topology

Clique Topology



17


Closest Topology

Finput = 2Finput = 3Finput = 4

6

4 2 5

3 3 7

5 2 4

Finput = 5

Labels Label <= Finput

18

Label by the closest


Closest Topology

Clique Topology



19


Clique Topology


4

5 2 5 6

3 2 4

6 3

Finput = 5

Labels Label <= Finput

20

Label by the row and column


Closest Topology

Clique Topology



21




4 4 5 5

2 2 3 3

6 6 7 7

Finput = 5

Label <= Finput

22

Label by the next row and column


Closest Topology

Clique Topology



23




6

2 5 2 4

3 7 4

3 5 6

Finput = 5

Label <= Finput

24

Label by the third column

Outline




25

Experimental Results

Ten benchmark kernels are used for comparison. Each is a single loop containing between 18 and 184 operations per iteration.

How Finput affects IPC (instruction per cycle)?

How Finput affects the Area ?

26


Finput vs. IPC27


Finput vs. Area28

Overall Results29

All Topologies

Outline




30

Overall Results31

Finput vs. IPC/Area

Overall Results Different interconnect topologies affect both performance and area.

Partially interconnected fabric is better than the fully connected fabric.

Software pipelining is affected by the amount of flexibility in the interconnect architecture.

32

References Steven J.E. Wilton, Noha Kafafi, Bingfeng Mei, Serge Vernalde

“Interconnect Architectures for Modulo-Scheduled Coarse-Grained Reconfigurable Arrays ”, 2004 IEEE.

Frank Bouwens, Mladen Berekovic, Andreas Kanstein, and Georgi Gaydadjiev, “Architectural Exploration of the ADRES Coarse-Grained Reconfigurable Array”, 2007.

Reiner Hartenstein, “Coarse Grain Reconfigurable Architectures”, 2001.

Lu Wan, Chen Dong, Deming Chen, “A New Coarse-Grained Reconfigurable Architecture with Fast Data Relay and Its Compilation Flow”.

33

Thanks

34

Date post:	16-Apr-2017
Category:	Technology
Upload:	moslemah
View:	427 times
Download:	0 times