Interconnect Architectures

Post on 16-Apr-2017

427 views 0 download

transcript

Interconnect Architectures for Modulo-Scheduled

Coarse-Grained Reconfigurable Arrays

Ahmed Hassan Mohammed

1

Outline

Introduction Reconfigurable Arrays

Device Architecture Mapping Technology

Proposed Architectures Experimental Results Overall Results References

2

Outline

Introduction Reconfigurable Arrays

Device Architecture Mapping Technology

Proposed Architectures Experimental Results Overall Results References

3

Introduction4

Introduction

ASIC

Reconfigurable

Arraysµ

Processors

Flexibility

Perf

orm

ance

5

Platforms

Outline

Introduction Reconfigurable Arrays

Device Architecture Mapping Technology

Proposed Architectures Experimental Results Overall Results References

6

Reconfigurable Arrays

Fine-grained Coarse-grained

Purpose

Basic Unit

level

Re-configurability

Performance

General purpose

LUT

bit-level

High overhead

Low

Application Specific

ALU

word-level

Reduced overhead

High

7

Reconfigurable Arrays

Fined-grained

Coarse-grained

Purpose

Basic Unit

level

Re-configurability

Performance

General purpose

LUT

bit-level

High overhead

Low

Application Specific

ALU

word-level

Reduced overhead

High

8

Outline

Introduction Reconfigurable Arrays

Device Architecture Mapping Technology

Proposed Architectures Experimental Results Overall Results References

9

Device Architecture10

Outline

Introduction Reconfigurable Arrays

Device Architecture Mapping Technology

Proposed Architectures Experimental Results Overall Results References

11

Mapping Technology

Dataflow graph Architecture

12

Mapping Technology

Iteration 1 Iteration 2 Iteration 3

13

Outline

Introduction Reconfigurable Arrays

Device Architecture Mapping Technology

Proposed Architectures Experimental Results Overall Results References

14

Proposed Architectures

Closest Topology

Clique Topology

Directional Topology

Heterogeneous Topology

15

Proposed Architectures

Finput = 3

Example16

Finput : Number of possible inputs for a CFU

Proposed Architectures

Closest Topology

Clique Topology

Directional Topology

Heterogeneous Topology

17

Proposed Architectures

Closest Topology

Finput = 2Finput = 3Finput = 4

6

4 2 5

3 3 7

5 2 4

Finput = 5

Labels Label <= Finput

18

Label by the closest

Proposed Architectures

Closest Topology

Clique Topology

Directional Topology

Heterogeneous Topology

19

Proposed Architectures

Clique Topology

Finput = 2Finput = 3Finput = 4

4

5 2 5 6

3 2 4

6 3

Finput = 5

Labels Label <= Finput

20

Label by the row and column

Proposed Architectures

Closest Topology

Clique Topology

Directional Topology

Heterogeneous Topology

21

Proposed Architectures

Directional Topology

Finput = 2Finput = 3Finput = 4

4 4 5 5

2 2 3 3

6 6 7 7

Finput = 5

Label <= Finput

22

Label by the next row and column

Proposed Architectures

Closest Topology

Clique Topology

Directional Topology

Heterogeneous Topology

23

Proposed Architectures

Heterogeneous Topology

Finput = 2Finput = 3Finput = 4

6

2 5 2 4

3 7 4

3 5 6

Finput = 5

Label <= Finput

24

Label by the third column

Outline

Introduction Reconfigurable Arrays

Device Architecture Mapping Technology

Proposed Architectures Experimental Results Overall Results References

25

Experimental Results

Ten benchmark kernels are used for comparison. Each is a single loop containing between 18 and 184 operations per iteration.

How Finput affects IPC (instruction per cycle)?

How Finput affects the Area ?

26

Experimental Results

Finput vs. IPC27

Experimental Results

Finput vs. Area28

Overall Results29

All Topologies

Outline

Introduction Reconfigurable Arrays

Device Architecture Mapping Technology

Proposed Architectures Experimental Results Overall Results References

30

Overall Results31

Finput vs. IPC/Area

Overall Results Different interconnect topologies affect both performance and area.

Partially interconnected fabric is better than the fully connected fabric.

Software pipelining is affected by the amount of flexibility in the interconnect architecture.

32

References Steven J.E. Wilton, Noha Kafafi, Bingfeng Mei, Serge Vernalde

“Interconnect Architectures for Modulo-Scheduled Coarse-Grained Reconfigurable Arrays ”, 2004 IEEE.

Frank Bouwens, Mladen Berekovic, Andreas Kanstein, and Georgi Gaydadjiev, “Architectural Exploration of the ADRES Coarse-Grained Reconfigurable Array”, 2007.

Reiner Hartenstein, “Coarse Grain Reconfigurable Architectures”, 2001.

Lu Wan, Chen Dong, Deming Chen, “A New Coarse-Grained Reconfigurable Architecture with Fast Data Relay and Its Compilation Flow”.

33

Thanks

34