+ All Categories
Home > Documents > Implementing An Associative Processor on FPGAs

Implementing An Associative Processor on FPGAs

Date post: 12-Jan-2016
Category:
Upload: davida
View: 32 times
Download: 1 times
Share this document with a friend
Description:
Implementing An Associative Processor on FPGAs. Memory. PE. Instruction Stream Control Unit. Memory. PE. Cell Interconnection Network. Memory. PE. Memory. PE. A Conceptual View of the KSU ASC Model. 1 Burgundy Focus OH 170 ……. PE0. - PowerPoint PPT Presentation
31
1 Implementing An Associative Processor on FPGAs
Transcript
Page 1: Implementing An Associative Processor on FPGAs

1

Implementing An Associative Processor on FPGAs

Page 2: Implementing An Associative Processor on FPGAs

2

A Conceptual View of the KSU ASC Model

Cel

l In

terc

on

nec

tio

n N

etw

ork

Memory PE

Memory PE

Memory PE

Memory PE

Instruction Stream Control

Unit

Page 3: Implementing An Associative Processor on FPGAs

3

An Example for the Data Memory Organization:Auto Information Stored in the PE Cells

PE0

PE1

3 Blue Focus OH 190 ……. PE2

PE3

2 Blue Taurus OH 160 …….

4 Red Focus PA 180 …….

1 Burgundy Focus OH 170 …….

ID Color Model State Rebate

Page 4: Implementing An Associative Processor on FPGAs

4

The Prototype of the Byte-serial ASC Processor

IS Control Unit

CPU (for Sequential and parallel instructions)

32-bit Instruction

Memory

Data Memory

PE Array

Associative Processing Array

Responder Resolution

Circuitry

MAX/MIN Circuitry 16 8-bit Common

Registers

Page 5: Implementing An Associative Processor on FPGAs

5

Prototype of the 4-PE Associative Processing Array

Data Memory0 PE0

PE Cell 0

Responder ResolutionCircuitry

MAX/MIN

Circuitry

At_Least_One_Responder

Data Memory1 PE1

PE Cell 1

Data Memory2 PE2

PE Cell 2

Data Memory3 PE3

PE Cell 3

Page 6: Implementing An Associative Processor on FPGAs

6

A Processing Element Overview

8-bi

t ALU

CarryOut 1-bitALU

MUXMUX

16 8-bitGeneralPurposeRegisters

16 1-bitLogicalRegisters

16-deep1-bitMaskStack

1-bit Responder Register

Common Registers

General-Purpose Registers

Logical Registers

Find/Step/ResolveFirst

Comparator

Page 7: Implementing An Associative Processor on FPGAs

7

Instruction Set and Assembling Language (1)

• Data Transfer Instructions

- LD address, dstreg - LDI immediate, dstreg

- LDRR srcreg, dstreg - LDRRSPD srcreg

- ST srcreg, address

• Arithmetic and Logical Instructions

(mnemonic srcreg1, srcreg2, dstreg)

– ADD SUB– AND OR XOR NOT– SLL SRL– SLT SLE SGT SGE SEQ SNE

Page 8: Implementing An Associative Processor on FPGAs

8

Instruction Set and Assembling Language (2)

• Mask Stack and Responder Instructions – SETMSK– TOPMSK TOPMSKRSPD– POPMSK POPMSKRSPD– POPTHEM POPTHEMRSPD– RPCMSK RPCMSKRSPD– PUSHMSK PUSHMSKRSPD– PUSHTHEM– PUSHMSKTHEM– STKTOMEM MEMTOSTK

– FIND– STEP– RESFST

Page 9: Implementing An Associative Processor on FPGAs

9

Instruction Set and Assembling Language (3)

• Maximum and Minimum Searching Instructions

– SETMXMI

– LDMXMI

– STMXMI

– MAX

– MIN

• Branch/Jump Instructions– BNR

– BRS

– J

Page 10: Implementing An Associative Processor on FPGAs

10

Associative Operations

Related PE Components:• The Responder Register:

to indicate whether a PE is a responder to a particular associative search or not

• The Step/Find/ResolveFirst Unit:

to support processing multiple responders in various ways

• The Mask Stack:

to represent at most 16 levels of association. The top of the Mask Stack always represents the current status of the PE – whether it is masked (‘1’) or unmasked (‘0’)

Page 11: Implementing An Associative Processor on FPGAs

11

Example of Associative Search:Find all Focus cars located in Ohio

• Perform the comparison: model == “Focus”, and store the result either ‘1’ or ‘0’ into $LR1

• Perform the comparison: location == “Ohio”, and store the result into $LR2

• AND $LR1 with $LR2, and store the result into the Responder Register

(Note: all the instructions above performed by all PEs in parallel are called unmasked instructions)

Page 12: Implementing An Associative Processor on FPGAs

12

Unmasked and Masked Instructions

Unmasked Instruction:

Executed by all the PEs regardless of the state of the Mask Stack

Masked Instruction:

Executed only by those PEs with a ‘1’ on the top of their Mask Stack

Page 13: Implementing An Associative Processor on FPGAs

13

Example of Associative Search Using Masked Instructions:

(Find all Focus cars located in Ohio ) Initialize the top of the Mask Stack to ‘1’• Perform the comparison: model == “Focus”, and store

the result ‘1’ or ‘0’ into $LR1• Perform the comparison: location == “Ohio”, and store

the result into $LR2• AND $LR1 with $LR2, and store the result into the

Responder Register AND the Responder Register with the top of the Mask

Stack, and push the ANDing result into the Mask Stack and also store it into the Responder Register

Increase the rebate of all Focus cars in Ohio by 10 (masked instruction)

Page 14: Implementing An Associative Processor on FPGAs

14

The MAX/MIN Circuitry, the Responder Resolution Circuitry, and PE3

R0 V0

R1 V1

R2 V2

R3 V3

V4

From PE0

From PE1

From PE2

From PE3

to PE0

to PE1

to PE2

to PE3

To CU

D0 MM0R0 D1 MM1R1 D2 MM2 R2

D3 MM3R3

From PE0 : GPR RPD

From PE1: GPR RPD

From PE2: GPR RPD

From PE3: GPR RPD

to PE0

to PE1

to PE2

to PE3

MaskStack

Responder

Step/Find /RslvFst

General Purpose Registers

Responder Resolution

MAX/MIN

PE 0

clr

Page 15: Implementing An Associative Processor on FPGAs

15

Using the Falkoff Algorithm for MAX/MIN Search

Maximum-Value Searching (the following steps areperformed in parallel for all the data)• Search bit slices of the data from the most

significant bit to the least significant bit: As each bit slice is processed, each bit is ANDed

with a corresponding MM bit (a 1-bit register used to indicate whether or not a data item is the maximum after processing a bit)

• Check the results of the AND to ensure that at least one new maximum value remains:

Page 16: Implementing An Associative Processor on FPGAs

16

Using the Falkoff Algorithm for MAX/MIN Search (continued)

If this condition is true, then the MM bits are updated by the results of AND; if all the results are 0, then the MM bits are not updated at this time

• Continue to process the remaining bit slices as above until all bits are processed

• After the least significant bit slice is processed:

If only one MM bit is ‘1’, it marks the largest number; if more than one MM bit is ‘1’, those data are tied for the maximum value

Page 17: Implementing An Associative Processor on FPGAs

17

Minimum-Value Searching: • Similar to maximum value searching, but

complement the bit slices each time before ANDing it with MM bits

Using the Falkoff Algorithm for MAX/MIN Search (continued)

Page 18: Implementing An Associative Processor on FPGAs

18

Bit Slices (7..0) of Rebates Values in MM bits During Processing

Process bit from MSB to LSB After processing each bit

(rebate) 76543210 Initialize 7 6 5 4 3 2 1 0

(170) 10101010 (MM0) 1 1 1 1 0 0 0 0 0

(160) 10100000 (MM1) 1 1 1 1 0 0 0 0 0

(190) 10111110 (MM2) 1 1 1 1 1 1 1 1 1 (max)

(180) 10110100 (MM3) 1 1 1 1 1 0 0 0 0

Search For the Maximum Rebate in the Data Memories

Page 19: Implementing An Associative Processor on FPGAs

19

MAX/MIN Circuit using the Falkoff Algorithm

OP

Data 0 to RPD0

RPD0

Data 1 to RPD1

RPD1

Data 2

RDP 2 to RPD2

Data 3

RPD 3 to RPD3 Mask_W

8-bit shift register0

MM0

“not”

8-bit shift register1

MM1

“not”

8-bit shift register2

MM2

“not”

8-bit shift register3

MM3

“not”

Page 20: Implementing An Associative Processor on FPGAs

20

The MAX/MIN Circuitry, the Responder Resolution Circuitry, and PE3

R0 V0

R1 V1

R2 V2

R3 V3

V4

From PE0

From PE1

From PE2

From PE3

to PE0

to PE1

to PE2

to PE3

To CU

D0 MM0R0 D1 MM1R1 D2 MM2 R2

D3 MM3R3

From PE0 : GPR RPD

From PE1: GPR RPD

From PE2: GPR RPD

From PE3: GPR RPD

to PE0

to PE1

to PE2

to PE3

MaskStack

Responder

Step/Find /RslvFst

General Purpose Registers

Responder Resolution

MAX/MIN

PE 0

clr

Page 21: Implementing An Associative Processor on FPGAs

21

Functionality of Responder Resolution Circuit

• Responder resolution:

Send an At-Least-One-Responder signal to the IS control unit

• Support responder selection:

Send a corresponding Responder_Before_Me

signal to each PE’s Find_ Step _ResolveFirst unit

Page 22: Implementing An Associative Processor on FPGAs

22

The Responder Resolution Circuitry for 4 PEs

R0 to R3 : from responder registersV0 to V3 : called Responder_Before_ME V4 : called At_Least_One_Responder

V0

R0

V1

R1

V2

V4

R2

V3

R3

Responder Resolution Circuitry ‘0’

PE0

PE1

PE2

PE3

Page 23: Implementing An Associative Processor on FPGAs

23

Responder Processing

• Process responders in parallel:– use masked instructions

• Process responders sequentially:– Need some responder selection instructions – Need a responder selection mechanism

Page 24: Implementing An Associative Processor on FPGAs

24

Responder Selection Instructions

• Steprepetitively used to pick one responding PE each time for further processing – “ for” loope.g., to step through all the Focus cars in Ohio to list the features available on each car

• Find select a responding PE, while still keeping all

responders identifiable – “ while” loop e.g., retrieve the tax rate from one of the cars located in OH, then increment the tax rate by a certain amount, afterwards apply this new tax rate to all the cars located in OH

Page 25: Implementing An Associative Processor on FPGAs

25

Responder Selection Instructions (continued)

• ResolveFirst select a responder and only keep this

responder identifiable

e.g., resolve one PE from several PEs which have the values tied for the maximum value

Page 26: Implementing An Associative Processor on FPGAs

26

The Responder Resolution Circuitry, MAX/MIN Circuitry, and PE3

R0 V0

R1 V1

R2 V2

R3 V3

V4

From PE0

From PE1

From PE2

From PE3

to PE0

to PE1

to PE2

to PE3

To CU

D0 MM0R0 D1 MM1R1 D2 MM2 R2

D3 MM3R3

From PE0 : GPR RPD

From PE1: GPR RPD

From PE2: GPR RPD

From PE3: GPR RPD

to PE0

to PE1

to PE2

to PE3

MaskStack

Responder

Step/Find /RslvFst

General Purpose Registers

Responder Resolution

MAX/MIN

PE 3

clr

Page 27: Implementing An Associative Processor on FPGAs

27

Design Language: VHDL

• A standard hardware description language used to model and design digital hardware- Support concurrent events

- can be translated into hardware by some design tools

• good for managing large design structures

• Supported by many CAD tool and programmable logic vendors

Page 28: Implementing An Associative Processor on FPGAs

28

Altera MAX+PLUS II Development System

Design Entry

Device Programming

ProgrammerData I/OOther Programmers

Graphic EditorText EditorWaveform EditorSymbol EditorFloorplan EditorOther Design Entry Tools

MAX+PLUS II Compiler

Design Verification

SimulatorWaveform EditorTiming AnalysisOther Verification Tools

Design Compilation

Page 29: Implementing An Associative Processor on FPGAs

29

Altera FLEX 10K FPLD

• FLEX10K70 Device:

- 3,744 LEs

- 9 EABs

- 70,000 gates totally

( IOEs – I/O elements)

Partial FLEX10K20 FPLD Architecture

EAB

EAB

LAB

LAB

LAB

LAB

IOEs

FastTrack Interconnect

IOEs

IOEs

IOEs

IOEs

IOEs

IOEsIOEs

IOEs

IOEs

IOEs

IOEsIOEs IOEs

Page 30: Implementing An Associative Processor on FPGAs

30

Simulation on FLEX 10K 70 Chip

• The ISCU runs at about 10MHz using 50% logical gates

• One EAB is used as a local memory for one PE; 4 PEs and the support circuit runs at about 14MHz using 82% logical cells.

From the simulation result, we can see that the FLEX10K 70 chip isn’t large enough for the 4-PE processor.So our current work is targeting on Altera APEX 20Kdevices with 1million gates in one chip.

Page 31: Implementing An Associative Processor on FPGAs

31

Future Work

• Explore more arithmetic features and associative operations

• Develop the complete ASC assembly language and the ASC back-end compiler

• Implement the PE cell interconnection network • Implement the whole ASC processor on

bigger and faster FPGA chips• Develop the multiple instruction stream MASC

model


Recommended