+ All Categories
Home > Documents > SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular...

SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular...

Date post: 18-Jan-2018
Category:
Upload: octavia-tucker
View: 223 times
Download: 0 times
Share this document with a friend
Description:
3 of 16 Communication Architectures uProc MEM DSP1 ASICDSP2 a) Bus BusNetwork-on-Chip (NoC) Advantages Disadvantages MEM uProcDSP1 ASICDSP2 b) Network-on-Chip NoC node Very well known Smaller hardware overhead SoC standards: Coreconnect®, Amba®, Wishbone Scalable Very high bandwidth Wires are broken in smaller segments Multiple and simultaneous parallel communications Does not scale well as number of modules increases High power consumption due to long wires Cross-talk issues Significant area overhead Exacerbated by store-and-forward routers Interfaces between modules and nodes are not standard Specific signals and handshaking protocols for each design
16
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center for High-Performance Reconfigurable Computing (CHREC) Department of Electrical and Computer Engineering University of Florida
Transcript
Page 1: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems

Abelardo Jara-Berrocal, Ann Gordon-RossNSF Center for High-Performance Reconfigurable Computing (CHREC)

Department of Electrical and Computer EngineeringUniversity of Florida

Page 2: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

2 of 16

Introduction – Parallel Computation

Edges indicate communication volume

1.System Formulation

3. Task Allocation / System Placement

Source

FIR

Sink

Matrix

IFFT

Angle

4000

15000

15000

82500

40000

4000

15000

FFT

1

2

3

4

5

6

7

2. Application decomposition

High Performance Application

1, 7 Data 2,6 4 3,5

uProc MEM DSP1 ASIC DSP2

Modules

To leverage parallel computation speedups, system can be decomposed in smaller tasks

Parallel communication

How do designers provide efficient module communication?

Problem: Speedup can be limited by inefficient communication!

Profile 1:DSP:0.5ms

uProc: 2.2ms

Profile 2:ASIC:0.5msDSP: 2.5ms

Page 3: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

3 of 16

Communication Architectures

uProcMEM

DSP1

ASIC DSP2

a) Bus

Bus Network-on-Chip (NoC)

Adv

anta

ges

Dis

adva

ntag

esMEM

uProc DSP1

ASIC DSP2

b) Network-on-ChipNoC node

• Very well known • Smaller hardware overhead• SoC standards: Coreconnect®, Amba®, Wishbone

• Scalable• Very high bandwidth

• Wires are broken in smaller segments• Multiple and simultaneous parallel communications

• Does not scale well as number of modules increases• High power consumption due to long wires• Cross-talk issues

• Significant area overhead• Exacerbated by store-and-forward routers

• Interfaces between modules and nodes are not standard• Specific signals and handshaking protocols for each design

Page 4: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

4 of 16

General NoC architecture

NoC Interface

NoC Link

NoC NodeRouters (packet switching)Switches (circuit switching)

MEM

uProcDSP1

ASIC DSP2

I/O Slave

DSP2

uProc

[1] Salminem et.al. Survey of Network-on-Chip Proposals. White Paper. OCP-IP, March 2008

NoC TopologyVary across designsCommonly 2D mesh or torus [1]

Page 5: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

5 of 16

Motivation• Relevant NoC metrics:

• Throughput• Latency• Area• Power

• 2D Mesh NoC• High throughput• Low latency• High communication parallelism

• Due to these advantages, some commercial 2D NoCs for ASICs have appeared:

• Arteris®• How about NoC implementations in FPGAs?

• FPGAs are increasingly used in digital designs– Reconfigurable– Lower cost than ASICs

• NoC area overhead becomes a problem– Area of a 3x3 2D Mesh NoC consumed 28.72% of a Xilinx V2P30[2](for maximum throughput of 9.5Gbps for complete 3x3 2D NoC)

• Problem is exacerbated with low capacity & low cost FPGA devices

N7

N4

N1

N8

N5

N2

N9

N6

N3

Nod

e

Mod

ul e

Arteris NoC

[2] B. Sethuraman, P. Bhattacharya, J. Khan, Ranga Vemuri: LiPaR: A light-weight parallel router for FPGA-based networks-on-chip. ACM Great Lakes Symposium on VLSI 2005: 452-457

Page 6: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

6 of 16

• SCORES = Scalable CCommunication Architecture for Reconfigurable Embedded Systems

• Main contributions:• High throughput / bandwidth

– Circuit switching scheme• Low area overhead

– Linear topology • Multiple clock domains• Scalability

– VHDL model with numerous architectural parameters– Allows customization for different SoCs communication needs

SCORES - Contributions

REC

ON

FIG

UR

AB

LE

DEV

ICE

(FPG

A)

Module 1 Module 2 Module 3

SCORESInterface Interface Interface

scores-clk

clk2clk3

clk1Diff

eren

t clo

ck d

omai

ns

Implemented in

Xilinx VLX25 FPGA

Page 7: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

7 of 16

clk

REC

ON

FIG

UR

AB

LE

DEV

ICE

(FPG

A)

Module 1 Module 2 Module 3

clk2clk3

clk1

SCORES – Top Level Design• SCORES main components:

• Switches – communication nodes inside SCORES• Interfaces – communication between modules and SCORES• Channels – communication links between switches and other

switches or interfaces• Modules access interfaces through local input ports and local output

ports

Module

SCORES

Switch

Interface

Interface Interface Interface

Page 8: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

8 of 16

SCORES – Parametric Architecture

Module 4Module 3Module 2Module 1

kl – number of left switch channels

kr – number of right switch channelsko - number local output ports from the interface

ki - number local input ports to the interface

SCORES

Interfaces

Switch

N = Number of modules W = Width of a channel in bits

Additional parameters

Parameters enable SCORES to conform to custom communication requirements

Page 9: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

9 of 16

SCORES – Terminology

Interface InterfaceInterface Interface

Module 1 Module 4Module 2 Module 3

• Producer: module which transmits data

• Consumer: module which receives data

• Streaming Data Channel (SDC):• Dedicated path between a

producer and a consumer• Dynamically created and

destroyed inside SCORES• Bidirectional path

• Data flows from producer to consumer

• Control synchronization signals flow from consumer to producer Producer

Streaming Data Channel (SDC)

Consumer

Page 10: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

10 of 16

SCORES – Communication Phases

Interface InterfaceInterface Interface

Module 1 Module 4Module 2 Module 3

• Three communication phases• Phase I: Channel establishment:

• Producer requests a path to the consumer

• Path iteratively created inside switches between the producer and the consumer

• If a switch has no available channels

– Sends a DENY signal to the producer

– Producer can drop or maintain the request

• If successful, the Streaming Data Channel (SDC) is created between the producer and the consumer

Producer

Streaming Data Channel (SDC)

Consumer

Page 11: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

11 of 16

SCORES – Communication Phases• Phase II: Streaming

transmission• Pipelined operation• If consumer buffer is full

– Consumer asserts “Full” to inform producer to pause transmission

• Interfaces built around asynchronous FIFOs

– Eases crossing different clock domains

• Phase III: Channel release• Producer deasserts its

request• Path between the

producer and the consumer is iteratively destroyed

Interface InterfaceInterface Interface

Module 1 Module 4Module 2 Module 3

Producer

Streaming Data Channel (SDC)

Consumer

Register

Page 12: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

12 of 16

SCORES – Simultaneous Data Transfers

Interface

Input Registers

Switch 1 Switch 2 Switch 3 Switch 4

Interface Interface Interface

MUXes Free channel

• Set of FSM controllers running at each switch• Allows SCORES to establish and operate multiple SDCs in parallel

Page 13: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

13 of 16

Results – Clock FrequencyFr

eque

ncy

(MH

z)

Number of right switch channels (Kr) (1 left switch

channel)

Number of left and right switch channels (Kr, Kl) (1 local input

and 1 local output port per switch)

Number of local input and output ports (Ki, Ko) per switch (1 left and 1 right

switch channel)

• Achieved SCORES maximum frequency is equal to the SCORES maximum throughput

Customized SCORES switch with 32-bit channels, 2 left and right switch channels, and 1 local input and 1 local output port operates at 254 MHz (Throughput=8.0Gbps, post place-and-route timing report).

Page 14: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

14 of 16

Results - AreaA

rea

(slic

es)

Customized SCORES switch with 32-bit channels, 2 left and right switch channels and 1 local input and 1 local output port consumes 315 slices (1.41% of Virtex 4 VLX25)

Number of right switch channels (Kr) (1 left switch

channel)

Number of left and right switch channels (Kr, Kl) (1 local input

and 1 local output port per switch)

Number of local input and output ports (Ki, Ko) per switch (1 left and 1 right

switch channel)

Page 15: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

15 of 16

Conclusions• We developed SCORES (Scalable Communication

Architecture for Reconfigurable Embedded Systems) - a highly parametric communication architecture

• SCORES Contributions:– Low area overhead (315 slices for a 32-bit switch with multiple

ports)– Modules can run at different and independent clock frequencies– Highly parametric design, which enables architecture

optimization• Future work

– Optimization of switch FSM controllers– Development of algorithms for module placement inside

SCORES– Tools for automatic determination of SCORES parameter values

Page 16: SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.

16 of 16

Questions


Recommended