+ All Categories
Home > Documents > Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications...

Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications...

Date post: 18-Sep-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
10
Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden ETI180 DSP-design Dec. 06 th , 2011 Department of Electrical and Information Technology, Lund University Outline Reconfigurable computing Coarse-grained reconfigurable cell array Processing cell Memory cell Network router cell System reconfiguration Reconfigurable FIR Reconfigurable FFT processor Multi-standard OFDM coarse time synchronization Department of Electrical and Information Technology, Lund University Reconfigurable computing Updates on the data path in addition to the control flow. • Combined flexibility with high performance at a feasible hardware cost. Software-centric programming approach. • Coarse-grained granularity – trade-off between efficiency, flexibility, and programmability. • Dynamic reconfigurability. Department of Electrical and Information Technology, Lund University High performance real-time DSP computing
Transcript
Page 1: Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden

Reconfigurable Cell Arrayfor DSP Applications

Chenxin Zhang

Department of Electrical and Information TechnologyLund University, Sweden

ETI180 DSP-design Dec. 06th, 2011

Department of Electrical and Information Technology, Lund University

Outline

• Reconfigurable computing

• Coarse-grained reconfigurable cell array

– Processing cell

– Memory cell

– Network router cell

– System reconfiguration

� ����������– Reconfigurable FIR

– Reconfigurable FFT processor

– Multi-standard OFDM coarse time synchronization

Department of Electrical and Information Technology, Lund University

Reconfigurable computing

• Updates on the data path in addition to the control flow.

• Combined flexibility with high performance at a feasible

hardware cost.

• Software-centric programming approach.

• Coarse-grained granularity – trade-off between efficiency,

flexibility, and programmability.

• Dynamic reconfigurability.

Department of Electrical and Information Technology, Lund University

High performance real-time DSP computing

Page 2: Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden

Department of Electrical and Information Technology, Lund University

Media.Processor

Apps.Processor

GPS

Multiple standards

CellularApps.Processor

BT

WLAN

DVB-H

Wimax

LTE-A

WCDMA

Media.Processor

5G?

Apps.Processor

Department of Electrical and Information Technology, Lund University

Software-defined hardware

• Hardware sharing

– Accelerators: poor hardware reusability

– Reconfigurable architecture

+ Multi-task

+ Multi-standard

+ Multi-algorithm

− Control overhead, e.g. area, power.

A B C D

Processing chain

Department of Electrical and Information Technology, Lund University

Performance vs. Flexibility

• Specialized hardware (ASIC)

+ High performance, small size, low power

- Less flexible, manufacturing defects

- High NRE cost

• Standard processor (GPP, DSP…)

+ Flexible, Short design time

- Lack of computation capacity

• Fine-grained reconfigurable architecture (FPGA)

+ High calculation capacity, flexible

- Routing overhead, high power consumption

- Hardware oriented design approach

Department of Electrical and Information Technology, Lund University

Application specific DSP:Tensilica ConnX Baseband Engine

Page 3: Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden

Department of Electrical and Information Technology, Lund University

Tabula Spacetime

• Ultra-rapid reconfiguration:

multi-GHz rates

• 2.5x logic density

• 3.7x DSP performance

Department of Electrical and Information Technology, Lund University

Coarse-grainedreconfigurable architecture

• High calculation capacity & flexible

• Software oriented: relevantly fast

development

• tolerance to manufacturing defects

• Sacrificed area & energy efficiency

compared to ASICs

• Sacrificed mapping flexibility compared

to FPGAs

CGRA

Department of Electrical and Information Technology, Lund University

Related work

• ALU clusters: MathStar FPOA, RICA…

– Instruction level, data level parallelism

– SIMD or VLIW

• Processor array: RAW, WPPA, REMARC…

– Instruction level, data level, and task level parallelism

– MIMD

• Hybrid structure: ADRES, PACT XPP…

– Instruction level, data level, and task level parallelism

– SIMD or VLIW and MIMD

– Combined complexity?

Courtesy: MathStar “FPOA architecture guide”.

Courtesy: D. Kissler et

al.“A Highly ParameterizableParallel Processor Array Architecture”.

Courtesy: PACT: “XPP-III Processor Overview”.

Department of Electrical and Information Technology, Lund University

R

R R

System infrastructure

• An array of resource cells.

• Heterogeneous cell array:

– Processing cell

– Memory cell

– Accelerator

(e.g. no configuration)

• Hierarchical cell array.

R

R

Addr. gen

Coeff. gen

Page 4: Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden

Department of Electrical and Information Technology, Lund University

Resource cell

• Dedicated local interconnections:

– High data throughput

• Hierarchical global routing network:

– Flexible global data transmission

– External data access

– Global cell (re)configuration

• Data driven synchronization

• Single-Cycle-Per-Hop latency

• AMBA 4 AXI4-stream protocol

• GALS network data transmission

L0

L2

L1

L3

G0

R

RC

Department of Electrical and Information Technology, Lund University

Processing cell

• Processing core

– ALU, DSP, SIMD, VLIW,

CORDIC...

– Implicit load-store operations in

all instructions.

– Run-time control and conditional

reconfiguration.

– In-cell NoC supervision and

reconfiguration.

• Processing shell

– Network adapter

P3 = f(P1,P2)P1

P2

Department of Electrical and Information Technology, Lund University

Example 1:Generic signal processing cell

• 4 pipeline stages.

• Hybrid Load-Store & Memory-Memory

architecture.

• Compact program size (memory

references).

• With external memory cells:

– Complex addressing modes, e.g.

memory indirect, auto-increment.

– Flexible usage: program/data

memory, processor stack, (cache).

• Single-cycle delayed branch.

• Zero-delay conditional inner loop

control.

P3 = f(P1,P2)P1

P2

Department of Electrical and Information Technology, Lund University

Example 2:Dataflow processing cell (I) Branch

IF/ID EXE/WB

Operation

controller

L0 L1 ... Lx G

Local IO ports Global IO port

PC

Register

ID/EXE

...

Input arrangement MUX

Arith/Logic selection

Output arrangement MUX

Output MUX

Page 5: Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden

Department of Electrical and Information Technology, Lund University

Example 2:Dataflow processing cell (II)

• SIMD/VLIW-like operation:

– 2/4-way 16/8-bit independent data processing

– Multi-level data processing (implicit prolog & epilog processing)

• Dual-operand instruction set:

– Dual-OpCode & Dual-Operand: e.g. ADDSUB R[d1], R[d2], R[s1], R[s2]

– Vector operation option: e.g. complex number arithmetic

• Dynamic data path reconfiguration

• Conditional instruction executions

Input arrangement MUX

Arith/Logic selection

Output arrangement MUX

Output MUX

Department of Electrical and Information Technology, Lund University

Dataflow processing cell:Dynamic data path reconfiguration

Input arrangement MUX

Arith/Logic selection

Output arrangement MUX

Output MUX

Department of Electrical and Information Technology, Lund University

Dataflow processing cell:Run-time data arrangement (II)

• Complex number multiplication vs. Real number multiplication

– MUL R3, R1, R2 ; R3 = R1 * R2 where {ab} is stored in R1

and {cd} is stored in R2.

Department of Electrical and Information Technology, Lund University

Memory cell (I)

� ������������� ��������������������� � �������� ������� �������

� ����������� ������ �������� ������������

� ��������������������� � ��������������

� ����������������������� ���������

� ��������������������������� ����������������������������

Page 6: Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden

Department of Electrical and Information Technology, Lund University

Memory cell (II)Memory descriptor

� ������������ ��������!��������������������� �����

� "���� �������������� ����������������#��$�������#��������$�

Department of Electrical and Information Technology, Lund University

Memory cell (III)������������������������������ ���� ������ ���� ������ ���� ������ ����

Sign Sign

I Q

2(I) 2(Q)

1(I) 1(Q)

3(I) 3(Q)

4(I) 4(Q)

Inphase Quadrature

011162731 19 3

12 bits -> 4 bits

3(Q) 1(Q)4(Q) 2(Q)3(I) 1(I)4(I) 2(I)

After 4 iterations

PC0 -> MC0

(a)

(b)

(c)

(d)Address ‘X’

Sign Sign

723

Shift by 0 &

mask

Shift by 20

& mask

Shift by 16

& mask

Shift by 4

& maskLogic

“or”

Department of Electrical and Information Technology, Lund University

Memory cell (IV)Reconfiguration

• Individual memory DSC loading & tracing

• Memory DSC execution program:

• Memory DSC execution mode: restart, resume

• Memory data dump (debug)

Department of Electrical and Information Technology, Lund University

Network router cell (I)

• Cell structure:

– Decision unit

– Routing structure :

• Parallel network

• MUX-DEMUX switch

– Output packet queue (FIFOs)

Page 7: Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden

Department of Electrical and Information Technology, Lund University

Network router cell (II)Decision unit

• Static routing table

• Managing data transactions:

– Check in

– Packet arbitration (MUX-DEMUX switch)

• Fixed

• Round-robin

• Data broadcast

– Configure routing path

Action list with candidate transactions

O(0) O(1) O(2) O(3) O(4)

In(0) o

In(1) o o o

In(2) x

In(3) o

In(def) x

Action list with candidate transaction

O(0) O(1) O(2) O(3) O(4)

In(0) x

In(1) o x x

In(2) x

In(3) x

In(def) x

(Parallel network)

(MUX-DEMUX switch)

Department of Electrical and Information Technology, Lund University

Static & Dynamic configuration (I)

icache dcache

Master

MPMC

R R

R R

R

Mem

ory

StreamCtrl

Conf.Ctrl

Department of Electrical and Information Technology, Lund University

Static & Dynamic configuration (II)

R R

R R

R

M1

M2 M3

M4

Department of Electrical and Information Technology, Lund University

• FIR filter

– Processing cell: MAC

– Memory cell: Input data FIFO, coefficient ROM

• Time-multiplexed structure for area driven application.

• Unfolding (parallelize) to improve processing throughput.

• High-precision computations.

Case study:Reconfigurable FIR

R R

R R

R

Page 8: Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden

Department of Electrical and Information Technology, Lund University

Case study:Reconfigurable FFT processor

• Radix-22 structure

• Folding

Department of Electrical and Information Technology, Lund University

Radix-22 FFT building block

• Basic radix-22 FFT building block

• A 2,048-point radix-22 pipeline FFT

Department of Electrical and Information Technology, Lund University

Radix-22 pipeline FFT

• Simple mapping– Simple to scale up.

– Local communication only.

– High storage capacity demand in each

single memory cell.

Department of Electrical and Information Technology, Lund University

Radix-22 pipeline FFT

• Simple mapping– Simple to scale up.

– Local communication only.

– High storage capacity demand in each

single memory cell.

• Simple mapping with concatenated memory cells

– Low storage capacity demand in each

single memory cell.

– Global data communications.

Page 9: Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden

Department of Electrical and Information Technology, Lund University

Time-multiplied FFT (I)

Department of Electrical and Information Technology, Lund University

Time-multiplied FFT (II)

• FFT benchmark comparison

– Rapid system reconfiguration: 40nS @300MHz

– High performance: 2.5x vs. DSPs, 6.5x vs. GPPs

Architecturefmax

[MHz]FFT size[point]

Execution time [cc]

Code size[byte]

Reconfigurationcode size [byte]

CGRA 5342561024

2,2429,943

1,032 30

Texas TMS-320VC5502

3002561024

5,38925,921

462462

(code reload)

ARM926EJ-S 2762561024

13,19466,196

- -

Department of Electrical and Information Technology, Lund University

Case study:Multi-standard OFDM synchronization

• Multiple wireless radio standards

• Concurrent data stream processing

• Coarse Time Synchronization

• Carrier Frequency Offset (CFO) estimation

${ }arg γ θ

[ ]γ θ

Department of Electrical and Information Technology, Lund University

Implementation results (I)

• 65 nm low-power regular VT CMOS:

– Area: 0.48 mm2

– Clock frequency: 534 MHz

• Adaptive word length scheduling.

• Adoption of different algorithms, e.g. Novel sign-bit OFDM acquisition.

Page 10: Reconfigurable Cell Array for DSP Applications...Reconfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden

Department of Electrical and Information Technology, Lund University

Summary

• Reconfigurable cell array enables hardware sharing at

different levels, i.e., task-, function-, and algorithm-level.

• Coarse-grained reconfigurable cell array comprises

distributed processing and memory cells, and a

hierarchical NoC structure.

• In-cell dynamic reconfiguration enables fast context

switching.


Recommended