+ All Categories
Home > Documents > A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc....

A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc....

Date post: 21-Jan-2016
Category:
Upload: noah-anderson
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions
Transcript
Page 1: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

A Programmable Single Chip Digital Signal

Processing Engine

MAPLD 2005

Paul Chiang, MathStar Inc.Pius Ng, Apache Design Solutions

Page 2: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

2 MAPLD 2005/206Chiang

Presentation Outline

• Space born signal processing tasks• FPOA architecture highlights • programmability and expandability• System partition on FPOA device• Spatial processing - 5x5 filter solution• Temporal processing – motion estimation• Internal bus and I/O throughput• Resource utilization and future expansion

Page 3: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

3 MAPLD 2005/206Chiang

A System of Digital Signal Processing

DataExtraction

InputData

Spatial orTemporal Processing

Frequency or Time domainProcessing

FeatureExtraction

Characterization

• mux/de-mux• Average filter• min/max select

• spatial edge filter• temporal difference filter

• time domain low/high/bandpass filter• frequency transformation• frequency domain low/high/bandpass filter

• apply equation that defines feature• checking threshold

• analyze and characterize signals

Page 4: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

4 MAPLD 2005/206Chiang

Processing Requirements

• High computation requirement on the following basic operations: add/sub and mul/mac,

• Mixed control functions such as loop control and decision making

• High I/O bandwidth to enable balanced processing vs. data input/output

• Large and fast temporary memory space to facilitate real-time processing

• Fast programmable and direct data transfer enables massive parallel processing

Page 5: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

5 MAPLD 2005/206Chiang

FPOA Architecture Summary

• Heterogeneous Array of 16-bitSilicon Objects MAC, ALU, Truth Tables, Register File,

Internal RAM Single Clock Cycle Execution for All

Objects• Homogeneous 2-Layer Programmable

Interconnect Mesh• Tightly Integrated Data and Control Flow• Integrated DDRII RLDRAM & SRAM

Controllers• High Speed I/O at Device Boundaries:

SerDes, LVDS, HSTL

Page 6: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

6 MAPLD 2005/206Chiang

Reconfigurable Interconnect Network

• Each link consists of 16 Data bits, 1 valid bit, and 4 separate control bits

• Nearest Neighbors Range = 1 (N/E/S/W + diagonal)

• Party Lines Single cycle range = hop to 3 (skip

2) @ 1GHz Extra clock cycles for digital

retiming• 1 extra 25-object neighborhood• More clock cycles entire chip

Page 7: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

7 MAPLD 2005/206Chiang

FPOA Solution

• Four GPIO ports with 44-bit I/O at 100 MHz, that is, 17.6 Giga bits per second

• Two 250MHz DDR 32-bit external memory with 32 Giga bits per second bandwidth

• 400 Silicon Objects running at 1 GHz ALU: add/sub, and combinational logic MAC: mul/mac Register File (RF): fast distributed data

storage Internal RAM (IRAM): intermediate data

storage

• Party lines and muxes to support flexible internal bus as well as dedicated connections

Page 8: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

8 MAPLD 2005/206Chiang

Example FPOA Partition

XR

AM

I/F

XRAM I/F

INT

BU

SC

ontr

olle

r

Local Bus I/F

Host Local Bus

A/D

IF

A/D

A/D

TempIRAM_1

TempIRAM_0

SpatialProcessor

Data SelectionLogic

TemporalProcessor

DataRealignmentFeature

Extraction

Sptial/TemporalController

Page 9: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

9 MAPLD 2005/206Chiang

5x5 Convolution Filter

• Apply the filter operation to a 2D data array, D[0:m-1, 0:n-1], with a 5x5 2D mask, W[0:4, 0:4]

for i = 2; i < m – 3; i++for j = 2; j < n – 3; j++

temp = 0;for k = -2; k < 3; k++for l = -2; l < 3; l++

temp = D[i+k, j+l] * W[k+2, l+2] + tempend_of_lend_of_kY[i, j] = temp;

end_of_jend_of_i

Page 10: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

10 MAPLD 2005/206Chiang

Computation Requirements

• Assuming an m by n 2D data array and a 5x5 mask, there are 25 Multiply and Add (MAC) operations for each filtered sample

• The whole convolution filter operation requires

25 * M * N MAC operations• With a standard 720x480 image data and 30

frames per second, the convolution filter operation requires

259 MMAC per second

Page 11: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

11 MAPLD 2005/206Chiang

Data Storage

• 2D data storage in a 1D linear memory where 4 16-bit word can be accessed concurrently

• Example of an 8x8 2D matrix stored in a 1D memory

0x0004 D14 D15 D16 D170x0003 D10 D11 D12 D130x0002 D04 D05 D06 D070x0001 D00 D01 D02 D03

Address (hex)

0x000F D74 D75 D76 D770x000E D70 D71 D72 D73

D00 D10 D20 D30 D40 D50 D60 D70D01 D11 D21 D31 D41 D51 D1 D71D02 D12 D22 D32 D42 D52 D62 D72D03 D13 D23 D33 D43 D53 D63 D73D04 D14 D24 D34 D44 D54 D64 D74D05 D15 D25 D35 D45 D55 D65 D75D06 D16 D26 D36 D46 D56 D66 D76D07 D17 D27 D37 D47 D57 D67 D77

• • •

Page 12: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

12 MAPLD 2005/206Chiang

Data Access Analysis

• Samples are stored in the external memory with slower access speed

• Maximize data bandwidth by accessing 4 words at a time

• Use Register Files to store weights and sample data so that they can be repeatedly used without going out to external memory

• Perform calculation on 4 pixels concurrently and rotate coefficients and samples in a way to form convolution operation

Page 13: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

13 MAPLD 2005/206Chiang

Data Processing Analysis

MUL MUL MUL MUL

D30D30D20D30D20D10D30D20D10D00

D31D31D21D31D21D11D31D21D11D01

D32D32D22D32D22D12D32D22D12D02

D33D33D23D33D23D13D33D23D13D03

W00W10W00W20W10W00W30W20W10W00

W01W11W01W21W11W01W31W21W11W01

W02W12W02W22W12W02W32W22W12W02

W03W13W03W23W13W03W33W23W13W03

Adder Tree

Y22 Y32 Y42 Y52

Y00 Y10 Y20 Y30 Y40 Y50Y01 Y11 Y21 Y31 Y41 Y51Y02 Y12 Y22 Y32 Y42 Y52Y03 Y13 Y23 Y33 Y43 Y53Y04 Y14 Y24 Y34 Y44 Y54Y05 Y15 Y25 Y35 Y45 Y55Y06 Y16 Y26 Y36 Y46 Y56Y07 Y17 Y27 Y37 Y47 Y57Y08 Y18 Y28 Y38 Y48 Y58Y09 Y19 Y29 Y39 Y49 Y59

Note 1: with a 5x5 filter the first two rows and columns are skippedNote 2: the sequence pattern of samples and coefficients are for the concurrent calculation of Y22, Y32, Y42, and Y52

Page 14: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

14 MAPLD 2005/206Chiang

FPOA Solution

• Temporary data storage

5 RFs, 3 ALUs• Data access

control 3 ALUs

• Multiplier 4 MACs

• Adder Tree 9 ALUs

• Temporary Results 2 RFs, 1 IRAM,

2 ALUs

MAC

MAC

MAC

MAC

Coef.RF

SampleRF

ControlLogic

DataAccessControl

AdderTree

TemporaryResults

Input Samples

Results

Page 15: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

15 MAPLD 2005/206Chiang

5x5 Convolution Filter Performance

• FPOA Resources ALU: 17 RF: 7 MAC: 4 IRAM: 1 Total: 28 SOs + 1 IRAM

• Data throughput 20 results every 125 cycles

Page 16: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

16 MAPLD 2005/206Chiang

Motion Estimation

• Identify the movement of a similar pattern over time

• The main computation involves calculating the sum of absolute difference (SAD) between two 8x8 blocks, ie. X[0:7, 0:7] and Y[0:7, 0:7]

sum = 0;for i = 0 to 7

for j = 0 to 7 temp = X[i, j] – Y[i, j] sum = sum + abs(temp)end_of_j

end_of_i

Page 17: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

17 MAPLD 2005/206Chiang

SAD Computation Dataflow

• 3 cycles throughput• Generates two partial sums of positive differences

…...Adder Tree

X70X60X50X40X30X20X10X00

X71X61X51X41X31X21X11X01

X72X62X52X42X32X22X12X02

X77X67X57X47X37X27X17X07

Y70Y60Y50Y40Y30Y20Y10Y00

Y70Y60Y50Y40Y30Y20Y10Y00

Y70Y60Y50Y40Y30Y20Y10Y00

Y70Y60Y50Y40Y30Y20Y10Y00

C_S_A C_S_A C_S_A C_S_A

SAD output

Compare

Sub Y &Add

Sub X &Add

X Y

X > YY > X

Page 18: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

18 MAPLD 2005/206Chiang

SAD Performance

• FPOA Resources ALU: 35 RF: 1 Total: 36 SOs

• Data throughput 24 cycles per 8x8 block

Page 19: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

19 MAPLD 2005/206Chiang

Internal System Bus

• Link all processing modules and the external host to the external memory for data accesses to the external system memory

• Host controlled round-robin access from module to module

• User defined package format to utilize the 16-bit party line and minimize the access overhead

Page 20: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

20 MAPLD 2005/206Chiang

System Bus Implementation

Memory Controller

Processing Element #1

XRAM RLDRAM

Processing Element #2

Processing Element #3

Page 21: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

21 MAPLD 2005/206Chiang

System Bus Performance

• FPOA Resources ALU: 20

• Cycles XRAM read: 4 cycles XRAM write: 4 cycles Module switch: 10 cycles

Page 22: A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

22 MAPLD 2005/206Chiang

Performance of an Example Space Satellite Application

• Processing Throughput About 10 Million Samples per second

• FPOA Resources (% of a device with 400 SOs and running at 400 MHz) Cycle utilization: 21% SO utilization: 51% IRAM utilization: 25% XRAM b/w: 49% (100 MHz DDR RLDRAM)


Recommended