+ All Categories
Home > Documents > Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP...

Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP...

Date post: 20-Apr-2018
Category:
Upload: trinhtuyen
View: 230 times
Download: 2 times
Share this document with a friend
10
Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden ETI180 DSP-design Dec. 06 th , 2011 Department of Electrical and Information Technology, Lund University Outline Reconfigurable computing Coarse-grained reconfigurable cell array Processing cell Memory cell Network router cell System reconfiguration Reconfigurable FIR Reconfigurable FFT processor Multi-standard OFDM coarse time synchronization Department of Electrical and Information Technology, Lund University Reconfigurable computing Updates on the data path in addition to the control flow. • Combined flexibility with high performance at a feasible hardware cost. Software-centric programming approach. • Coarse-grained granularity – trade-off between efficiency, flexibility, and programmability. • Dynamic reconfigurability. Department of Electrical and Information Technology, Lund University High performance real-time DSP computing
Transcript
Page 1: Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP Applications ... • Coarse-grained granularity – trade-off between efficiency, ... Lund

Reconfigurable Cell Arrayfor DSP Applications

Chenxin Zhang

Department of Electrical and Information TechnologyLund University, Sweden

ETI180 DSP-design Dec. 06th, 2011

Department of Electrical and Information Technology, Lund University

Outline

• Reconfigurable computing

• Coarse-grained reconfigurable cell array

– Processing cell

– Memory cell

– Network router cell

– System reconfiguration

� ����������– Reconfigurable FIR

– Reconfigurable FFT processor

– Multi-standard OFDM coarse time synchronization

Department of Electrical and Information Technology, Lund University

Reconfigurable computing

• Updates on the data path in addition to the control flow.

• Combined flexibility with high performance at a feasible

hardware cost.

• Software-centric programming approach.

• Coarse-grained granularity – trade-off between efficiency,

flexibility, and programmability.

• Dynamic reconfigurability.

Department of Electrical and Information Technology, Lund University

High performance real-time DSP computing

Page 2: Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP Applications ... • Coarse-grained granularity – trade-off between efficiency, ... Lund

Department of Electrical and Information Technology, Lund University

Media.Processor

Apps.Processor

GPS

Multiple standards

CellularApps.Processor

BT

WLAN

DVB-H

Wimax

LTE-A

WCDMA

Media.Processor

5G?

Apps.Processor

Department of Electrical and Information Technology, Lund University

Software-defined hardware

• Hardware sharing

– Accelerators: poor hardware reusability

– Reconfigurable architecture

+ Multi-task

+ Multi-standard

+ Multi-algorithm

− Control overhead, e.g. area, power.

A B C D

Processing chain

Department of Electrical and Information Technology, Lund University

Performance vs. Flexibility

• Specialized hardware (ASIC)

+ High performance, small size, low power

- Less flexible, manufacturing defects

- High NRE cost

• Standard processor (GPP, DSP…)

+ Flexible, Short design time

- Lack of computation capacity

• Fine-grained reconfigurable architecture (FPGA)

+ High calculation capacity, flexible

- Routing overhead, high power consumption

- Hardware oriented design approach

Department of Electrical and Information Technology, Lund University

Application specific DSP:Tensilica ConnX Baseband Engine

Page 3: Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP Applications ... • Coarse-grained granularity – trade-off between efficiency, ... Lund

Department of Electrical and Information Technology, Lund University

Tabula Spacetime

• Ultra-rapid reconfiguration:

multi-GHz rates

• 2.5x logic density

• 3.7x DSP performance

Department of Electrical and Information Technology, Lund University

Coarse-grainedreconfigurable architecture

• High calculation capacity & flexible

• Software oriented: relevantly fast

development

• tolerance to manufacturing defects

• Sacrificed area & energy efficiency

compared to ASICs

• Sacrificed mapping flexibility compared

to FPGAs

CGRA

Department of Electrical and Information Technology, Lund University

Related work

• ALU clusters: MathStar FPOA, RICA…

– Instruction level, data level parallelism

– SIMD or VLIW

• Processor array: RAW, WPPA, REMARC…

– Instruction level, data level, and task level parallelism

– MIMD

• Hybrid structure: ADRES, PACT XPP…

– Instruction level, data level, and task level parallelism

– SIMD or VLIW and MIMD

– Combined complexity?

Courtesy: MathStar “FPOA architecture guide”.

Courtesy: D. Kissler et

al.“A Highly ParameterizableParallel Processor Array Architecture”.

Courtesy: PACT: “XPP-III Processor Overview”.

Department of Electrical and Information Technology, Lund University

R

R R

System infrastructure

• An array of resource cells.

• Heterogeneous cell array:

– Processing cell

– Memory cell

– Accelerator

(e.g. no configuration)

• Hierarchical cell array.

R

R

Addr. gen

Coeff. gen

Page 4: Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP Applications ... • Coarse-grained granularity – trade-off between efficiency, ... Lund

Department of Electrical and Information Technology, Lund University

Resource cell

• Dedicated local interconnections:

– High data throughput

• Hierarchical global routing network:

– Flexible global data transmission

– External data access

– Global cell (re)configuration

• Data driven synchronization

• Single-Cycle-Per-Hop latency

• AMBA 4 AXI4-stream protocol

• GALS network data transmission

L0

L2

L1

L3

G0

R

RC

Department of Electrical and Information Technology, Lund University

Processing cell

• Processing core

– ALU, DSP, SIMD, VLIW,

CORDIC...

– Implicit load-store operations in

all instructions.

– Run-time control and conditional

reconfiguration.

– In-cell NoC supervision and

reconfiguration.

• Processing shell

– Network adapter

P3 = f(P1,P2)P1

P2

Department of Electrical and Information Technology, Lund University

Example 1:Generic signal processing cell

• 4 pipeline stages.

• Hybrid Load-Store & Memory-Memory

architecture.

• Compact program size (memory

references).

• With external memory cells:

– Complex addressing modes, e.g.

memory indirect, auto-increment.

– Flexible usage: program/data

memory, processor stack, (cache).

• Single-cycle delayed branch.

• Zero-delay conditional inner loop

control.

P3 = f(P1,P2)P1

P2

Department of Electrical and Information Technology, Lund University

Example 2:Dataflow processing cell (I) Branch

IF/ID EXE/WB

Operation

controller

L0 L1 ... Lx G

Local IO ports Global IO port

PC

Register

ID/EXE

...

Input arrangement MUX

Arith/Logic selection

Output arrangement MUX

Output MUX

Page 5: Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP Applications ... • Coarse-grained granularity – trade-off between efficiency, ... Lund

Department of Electrical and Information Technology, Lund University

Example 2:Dataflow processing cell (II)

• SIMD/VLIW-like operation:

– 2/4-way 16/8-bit independent data processing

– Multi-level data processing (implicit prolog & epilog processing)

• Dual-operand instruction set:

– Dual-OpCode & Dual-Operand: e.g. ADDSUB R[d1], R[d2], R[s1], R[s2]

– Vector operation option: e.g. complex number arithmetic

• Dynamic data path reconfiguration

• Conditional instruction executions

Input arrangement MUX

Arith/Logic selection

Output arrangement MUX

Output MUX

Department of Electrical and Information Technology, Lund University

Dataflow processing cell:Dynamic data path reconfiguration

Input arrangement MUX

Arith/Logic selection

Output arrangement MUX

Output MUX

Department of Electrical and Information Technology, Lund University

Dataflow processing cell:Run-time data arrangement (II)

• Complex number multiplication vs. Real number multiplication

– MUL R3, R1, R2 ; R3 = R1 * R2 where {ab} is stored in R1

and {cd} is stored in R2.

Department of Electrical and Information Technology, Lund University

Memory cell (I)

� ������������� ��������������������� � �������� ������� �������

� ����������� ������ �������� ������������

� ��������������������� � ��������������

� ����������������������� ���������

� ��������������������������� ����������������������������

Page 6: Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP Applications ... • Coarse-grained granularity – trade-off between efficiency, ... Lund

Department of Electrical and Information Technology, Lund University

Memory cell (II)Memory descriptor

� ������������ ��������!��������������������� �����

� "���� �������������� ����������������#��$�������#��������$�

Department of Electrical and Information Technology, Lund University

Memory cell (III)������������������������������ ���� ������ ���� ������ ���� ������ ����

Sign Sign

I Q

2(I) 2(Q)

1(I) 1(Q)

3(I) 3(Q)

4(I) 4(Q)

Inphase Quadrature

011162731 19 3

12 bits -> 4 bits

3(Q) 1(Q)4(Q) 2(Q)3(I) 1(I)4(I) 2(I)

After 4 iterations

PC0 -> MC0

(a)

(b)

(c)

(d)Address ‘X’

Sign Sign

723

Shift by 0 &

mask

Shift by 20

& mask

Shift by 16

& mask

Shift by 4

& maskLogic

“or”

Department of Electrical and Information Technology, Lund University

Memory cell (IV)Reconfiguration

• Individual memory DSC loading & tracing

• Memory DSC execution program:

• Memory DSC execution mode: restart, resume

• Memory data dump (debug)

Department of Electrical and Information Technology, Lund University

Network router cell (I)

• Cell structure:

– Decision unit

– Routing structure :

• Parallel network

• MUX-DEMUX switch

– Output packet queue (FIFOs)

Page 7: Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP Applications ... • Coarse-grained granularity – trade-off between efficiency, ... Lund

Department of Electrical and Information Technology, Lund University

Network router cell (II)Decision unit

• Static routing table

• Managing data transactions:

– Check in

– Packet arbitration (MUX-DEMUX switch)

• Fixed

• Round-robin

• Data broadcast

– Configure routing path

Action list with candidate transactions

O(0) O(1) O(2) O(3) O(4)

In(0) o

In(1) o o o

In(2) x

In(3) o

In(def) x

Action list with candidate transaction

O(0) O(1) O(2) O(3) O(4)

In(0) x

In(1) o x x

In(2) x

In(3) x

In(def) x

(Parallel network)

(MUX-DEMUX switch)

Department of Electrical and Information Technology, Lund University

Static & Dynamic configuration (I)

icache dcache

Master

MPMC

R R

R R

R

Mem

ory

StreamCtrl

Conf.Ctrl

Department of Electrical and Information Technology, Lund University

Static & Dynamic configuration (II)

R R

R R

R

M1

M2 M3

M4

Department of Electrical and Information Technology, Lund University

• FIR filter

– Processing cell: MAC

– Memory cell: Input data FIFO, coefficient ROM

• Time-multiplexed structure for area driven application.

• Unfolding (parallelize) to improve processing throughput.

• High-precision computations.

Case study:Reconfigurable FIR

R R

R R

R

Page 8: Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP Applications ... • Coarse-grained granularity – trade-off between efficiency, ... Lund

Department of Electrical and Information Technology, Lund University

Case study:Reconfigurable FFT processor

• Radix-22 structure

• Folding

Department of Electrical and Information Technology, Lund University

Radix-22 FFT building block

• Basic radix-22 FFT building block

• A 2,048-point radix-22 pipeline FFT

Department of Electrical and Information Technology, Lund University

Radix-22 pipeline FFT

• Simple mapping– Simple to scale up.

– Local communication only.

– High storage capacity demand in each

single memory cell.

Department of Electrical and Information Technology, Lund University

Radix-22 pipeline FFT

• Simple mapping– Simple to scale up.

– Local communication only.

– High storage capacity demand in each

single memory cell.

• Simple mapping with concatenated memory cells

– Low storage capacity demand in each

single memory cell.

– Global data communications.

Page 9: Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP Applications ... • Coarse-grained granularity – trade-off between efficiency, ... Lund

Department of Electrical and Information Technology, Lund University

Time-multiplied FFT (I)

Department of Electrical and Information Technology, Lund University

Time-multiplied FFT (II)

• FFT benchmark comparison

– Rapid system reconfiguration: 40nS @300MHz

– High performance: 2.5x vs. DSPs, 6.5x vs. GPPs

Architecturefmax

[MHz]FFT size[point]

Execution time [cc]

Code size[byte]

Reconfigurationcode size [byte]

CGRA 5342561024

2,2429,943

1,032 30

Texas TMS-320VC5502

3002561024

5,38925,921

462462

(code reload)

ARM926EJ-S 2762561024

13,19466,196

- -

Department of Electrical and Information Technology, Lund University

Case study:Multi-standard OFDM synchronization

• Multiple wireless radio standards

• Concurrent data stream processing

• Coarse Time Synchronization

• Carrier Frequency Offset (CFO) estimation

${ }arg γ θ

[ ]γ θ

Department of Electrical and Information Technology, Lund University

Implementation results (I)

• 65 nm low-power regular VT CMOS:

– Area: 0.48 mm2

– Clock frequency: 534 MHz

• Adaptive word length scheduling.

• Adoption of different algorithms, e.g. Novel sign-bit OFDM acquisition.

Page 10: Reconfigurable Cell Array for DSP Applications - LTH · Reconfigurable Cell Array for DSP Applications ... • Coarse-grained granularity – trade-off between efficiency, ... Lund

Department of Electrical and Information Technology, Lund University

Summary

• Reconfigurable cell array enables hardware sharing at

different levels, i.e., task-, function-, and algorithm-level.

• Coarse-grained reconfigurable cell array comprises

distributed processing and memory cells, and a

hierarchical NoC structure.

• In-cell dynamic reconfiguration enables fast context

switching.


Recommended