1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State...

Post on 20-Jan-2016

227 views 4 download

transcript

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Reconfigurable Architectures• Forces that drive a Reconfigurable Architecture

– Price• Mass production 100K to millions• Experimental 1 to 10’s

– Granularity of reconfiguration• Fine grain• Course Grain

– Degree of system integration/coupling• Tightly• Loosely

All are a function of the application that will run on the Architecture

2 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Example Points in (Price,Granularity,Coupling) Space

Price

$100’s

$1M’s

Granularity

Coarse

Fine

CouplingLoose Tight

Intel /AMD

Int

float

RFU

Processor

PC

ML507

Ethernet

Decode

Exec

Store

3 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

What’s the point of a Reconfigurable Architecture

• Performance metrics– Computational

• Throughput• Latency

– Power• Total power dissipation• Thermal

– Reliability• Recovery from faults

Increase application performance!

4 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Typical Approach for Increasing Performance

• Application/algorithm implemented in software– Often easier to write an application in software

• Profile application (e.g. gprof)– Determine where the application is spending its time

• Identify kernels of interest– e.g. application spends 90% of its time in function

matrix_multiply()• Design custom hardware/instruction to accelerate kernel(s)

– Analysis to kernel to determine how to extract fine/coarse grain parallelism (does any parallelism even exist?)

Amdahl’s Law!

5 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity

6 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Coarse Grain

• rDPA: reconfigurable Data Path Array• Function Units with programmable interconnects

ALU ALU ALU

ALU ALU ALU

ALU ALU ALU

Example

7 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Coarse Grain

• rDPA: reconfigurable Data Path Array• Function Units with programmable interconnects

ALU ALU ALU

ALU ALU ALU

ALU ALU ALU

Example

8 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Coarse Grain

• rDPA: reconfigurable Data Path Array• Function Units with programmable interconnects

ALU ALU ALU

ALU ALU ALU

ALU ALU ALU

Example

9 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Fine Grain

• FPGA: Field Programmable Gate Array• Sea of general purpose logic gates

CLB CLB CLB CLB

CLB CLB CLB CLB

CLB CLB CLB CLB

CLB CLB CLB CLB

Configurable Logic Block

10 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Fine Grain

• FPGA: Field Programmable Gate Array• Sea of general purpose logic gates

CLB CLB CLB

CLB CLB CLB CLB

CLB CLB CLB CLB

CLB CLB CLB CLB

Configurable Logic Block

11 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Fine Grain

• FPGA: Field Programmable Gate Array• Sea of general purpose logic gates

CLB CLB

CLB

CLB

CLB CLB CLB CLB

Configurable Logic Block

12 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Trade-offsTrade-offs associated with LUT size

Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits

1024-bits

2-LUT

10-LUTMicroprocessor

13 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Trade-offsTrade-offs associated with LUT size

Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits

1024-bits

2-LUT

10-LUTMicroprocessor

4

3

3

AB

op3

14 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Trade-offsTrade-offs associated with LUT size

Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits

1024-bits

2-LUT

10-LUTMicroprocessor

4

3

3

AB

op3

4

3

3AB

op3

4

3

3

AB

op3

15 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Trade-offsTrade-offs associated with LUT size

Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits

1024-bits

2-LUT

10-LUTMicroprocessor

4

3

3

AB

op

3

4

3

3AB

op

3

3

3

3

AB

op

16 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Trade-offsTrade-offs associated with LUT size

Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits

1024-bits

2-LUT

10-LUTMicroprocessor

4

3

3

AB

op

3

4

3

3AB

op

3

4

3

3

AB

op

3

17 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Trade-offsTrade-offs associated with LUT size

Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits

1024-bits

2-LUT

10-LUT

Bit logic and constants

18 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Trade-offsTrade-offs associated with LUT size

Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits

1024-bits

2-LUT

10-LUT

Bit logic and constants

(A and “1100”) or (B or “1000”)

19 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Trade-offsTrade-offs associated with LUT size

Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits

1024-bits

2-LUT

10-LUT

Bit logic and constants

(A and “1100”) or (B or “1000”)

A

B

20 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Trade-offsTrade-offs associated with LUT size

Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits)1024-bits

1024-bits

2-LUT

10-LUT

Bit logic and constants

(A and “1100”) or (B or “1000”)

A AND

OR

OR

1

0

B

4

4

It’s much worse, each 10-LUT only has one output

Area that wasrequired using

2-LUTS

21 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: Example Architectures

• Fine grain: GARP

• Course grain: PipeRench

22 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: GARP

CPU RFU

Garp chip

Memory

I-cache D-cache

Configcache

23 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: GARP

CPU RFU

Garp chip

Memory

I-cache D-cache

Configcache

RFUcontrol

(1)Execution(16, 2-bit)

N

PE (Processing Element)

24 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: GARP

CPU RFU

Garp chip

Memory

I-cache D-cache

Configcache

RFUcontrol

(1)Execution(16, 2-bit)

N

PE (Processing Element)Example computations in one cycleA<<10 | (b&c)(A-2*b+c)

25 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: GARP

CPU RFU

Garp chip

Memory

I-cache D-cache

Configcache

Impact of configuration size• 1 GHz bus frequency•128-bit memory bus• 512Kbits of configuration size

On a RFU context switch how longto load a new full configuration?

4 microseconds

An estimate of amount of time for theCPU perform a context switch is ~5 microseconds

~2x increase context switch latency!!

26 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: GARP

CPU RFU

Garp chip

Memory

I-cache D-cache

Configcache

RFUcontrol

(1)Execution(16, 2-bit)

N

PE (Processing Element)

“The Garp Architecture and C Compiler”http://www.cs.cmu.edu/~tcal/IEEE-Computer-Garp.pdf

27 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench • Coarse granularity

• Higher (higher) level programming

• Reference papers• PipeRench: A Coprocessor for Streaming Multimedia Acceleration

(ISCA 1999): http://www.cs.cmu.edu/~mihaib/research/isca99.pdf• PipeRench Implementation of the Instruction Path Coprocessor

(Micro 2000): http://class.ee.iastate.edu/cpre583/papers/piperench_Micro_2000.pdf

28 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

Interconnect

8-bit ALU

Reg file

PE8-bit ALU

Reg file

PE8-bit ALU

Reg file

PE

Interconnect

8-bit ALU

Reg file

PE8-bit ALU

Reg file

PE8-bit ALU

Reg file

PE

8-bit ALU

Reg file

PE8-bit ALU

Reg file

PE8-bit ALU

Reg file

PE

Glo

bal b

us

29 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

30 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

31 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

32 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

33 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

1

2

3

34 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

1

2

3

2

3

4

35 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

1

2

3

2

3

4

0

3

4

36 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

1

2

3

2

3

4

0

3

4

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

37 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

1

2

3

2

3

4

0

3

4

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

38 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

1

2

3

2

3

4

0

3

4

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

0

1

39 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

1

2

3

2

3

4

0

3

4

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

0

1

0

1

2

40 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

1

2

3

2

3

4

0

3

4

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

0

1

0

1

2

3

1

2

41 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

1

2

3

2

3

4

0

3

4

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

0

1

0

1

2

3

1

2

3

4

2

42 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

PE PE PEPE

PE PE PEPE

PE PE PEPE

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

3

4

0

1

0

1

2

1

2

3

2

3

4

0

3

4

0

Cycle

Pipelinestage

1 2 3 4 5 6

0

1

2

0

1

0

1

2

3

1

2

3

4

2

3

4

0

43 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Degree of Integration/Coupling • Independent Reconfigurable Coprocessor

– Reconfigurable Fabric does not have direct communication with the CPU

• Processor + Reconfigurable Processing Fabric– Loosely coupled on the same chip– Tightly coupled on the same chip

44 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Degree of Integration/Coupling M

ain M

emory

CPU

Fe

tch

De

code

Execute Me

mory

Write

Back

L1 Cache

L2 Cache

MemoryController

DMAController

I/OController

USB PCI PCI-Express SATA

Hard DriveNIC

ALU

FPU

45 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Degree of Integration/Coupling M

ain M

emory

CPU

Fe

tch

De

code

Execute Me

mory

Write

Back

L1 Cache

L2 Cache

MemoryController

DMAController

I/OController

USB PCI PCI-Express SATA

Hard DriveNIC

ALU

FPU

RPF

46 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Degree of Integration/Coupling M

ain M

emory

CPU

Fe

tch

De

code

Execute Me

mory

Write

Back

L1 Cache

L2 Cache

MemoryController

DMAController

I/OController

USB PCI PCI-Express SATA

Hard DriveNIC

ALU

FPURPF

47 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Degree of Integration/Coupling M

ain M

emory

CPU

Fe

tch

De

code

Execute Me

mory

Write

Back

L1 Cache

L2 Cache

MemoryController

DMAController

I/OController

USB PCI PCI-Express SATA

Hard DriveNIC

ALU

FPU

RPF

ConfigI/F

48 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Degree of Integration/Coupling M

ain M

emory

CPU

Fe

tch

De

code

Execute Me

mory

Write

Back

L1 Cache

L2 Cache

MemoryController

DMAController

I/OController

USB PCI PCI-Express SATA

Hard DriveNIC

ALU

FPU

RPF

ConfigI/F

49 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Degree of Integration/Coupling M

ain M

emory

CPU

Fe

tch

De

code

Execute Me

mory

Write

Back

L1 Cache

L2 Cache

MemoryController

DMAController

I/OController

USB PCI PCI-Express SATA

Hard DriveNIC

ALU

FPU

RPFI/O

ConfigI/F

50 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Degree of Integration/Coupling M

ain M

emory

CPU

Fe

tch

De

code

Execute Me

mory

Write

Back

L1 Cache

L2 Cache

MemoryController

DMAController

I/OController

USB PCI PCI-Express SATA

Hard DriveNIC

ALU

FPURFU

51 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

52 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Next Class

• Reconfiguration Management– Chapter 4

53 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Questions/Comments/Concerns

• Write down– Main point of lecture

– One thing that’s still not quite clear

– If everything is clear, then give an example of how to apply something from lecture

OR

54 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Lecture notes

55 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: PipeRench

• Scheduling virtual stage on to physical• Partial/Dynamically reconfig (each cycle)

56 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Granularity: GARP

• Impact of configuration size on performance• Context switching

• Garp feature• Dynamic reconfigurable• Store multiple configurations in an on chip

cache (4)• One configuration at a time

• Example app mapping to GARP (loop)• Amdahl's Law

The Garp Architecture and C Compiler• http://www.cs.cmu.edu/~tcal/IEEE-Computer-Garp.pdf

57 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Overview• Dimensions

– Price– Granularity– Coupling– To optimize App Performance (compute (throughput, latency),

Power, reliability)• RPF to efficiently implement VICs

– Main picture authors' wants to convey• What’s the point or having a Reconfigure arch

– Example (Increase App performance)• App -> SW/CPU• Profile• ID kernels of intense compute• Design custom hardware/instruction (Amdels law)

– Intel FPL paper, great example for reading by Friday

58 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

Reconfigurable Architectures• RPF -> VIC (short slide)