+ All Categories
Home > Documents > ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to...

ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to...

Date post: 04-Jan-2016
Category:
Upload: fay-chapman
View: 240 times
Download: 4 times
Share this document with a friend
Popular Tags:
131
ENG6530 Reconfigurable Computing Systems Introduction to Introduction to Reconfigurable Computing Reconfigurable Computing
Transcript
Page 1: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 Reconfigurable Computing

Systems

Introduction to Introduction to Reconfigurable ComputingReconfigurable Computing

Page 2: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 2

Topics

The VLSI Design Process Application Specific Integrated Circuits Issues related to power consumption Issues related to scaling

Traditional Von Neumann Architecture Limitations Enhancements to VNA

Reconfigurable Computing (fill the gap!) Research Issues

Summary

Page 3: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 3

References

I. “Reconfigurable Computing: Accelerating Computation with FPGAs”, by Maya Gokhale

II. “Introduction to Reconfigurable Computing: Architectures, Algorithms and Applications”, by C. Bobda

III. “Computer Organization and Design”, by Patterson and Hennessy

IV. “Digital Integrated Circuits: A Design Perspective”, by Jan Rabaey

Page 4: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 4

PDA

Body

Entertainment

Household

Communication

Home Networking

Car

Medicine

PC

Super Computer

Computing Devices Everywhere!

Game console

Page 5: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 5

The Transistor Revolution

First transistorBell Labs, 1948

Bipolar logic1960’s

• Intel 4004 processor • Designed in 1971• Almost 3000 transistors• Speed:1 MHz operation

Page 6: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 6

The VLSI Design Process

Physical Design

Partitioning

Routing

Floorplanning&Placement

Specification

Functional design

Circuit design

Physical design

Test/Fabrication

Logic design

Very Costly

Time Consuming

Page 7: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 7

Productivity Gap in VLSI Design

A growing gap between design complexity and design productivity

Page 8: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 8

MOSFET: Metal Oxide Semiconductor

Field Effect Transistor

A voltage controlled device Handles less current than a BJT (Slower) Dissipates less power Achieves higher density on an IC Has full swing voltage 0 5V

Page 9: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 9

MOS Transistors -Types and Symbols

D

S

G

D

S

G

G

S

D D

S

G

NMOS Enhancement NMOS

PMOS

Depletion

Enhancement

B

NMOS withBulk Contact

Circuits can be built using either NMOS transistors or PMOS transistors

Page 10: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

10

Implementing Logic using: nMOS vs. pMOS Devices

Page 11: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 11

Complementary MOS (CMOS)

NMOS Transistors pass a ``strong” 0 but a ``weak” 1

PMOS Transistors pass a ``strong” 1 but a ``weak” 0

Combining both would lead to circuits that can pass strong 0’s and strong 1’s

X Y

C

C

Page 12: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

12

Static Complementary MOS (CMOS)

VDD

F(In1,In2,…InN)

In1In2

InN

In1In2

InN

PUN

PDN

PMOS only

NMOS only

PUN and PDN are dual logic networks

……

At every point in time (except during the switching transients) each gate output is connected to either VDD or VSS via a low resistive path

VSS

Page 13: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 13

CMOS Inverter (normal output)

A Y

0

1

VDD

A Y

GNDA Y

Pull-up Network

Pull-down Network

Page 14: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 14

CMOS Inverter

A Y

0

1 0

VDD

A=1 Y=0

GND

ON

OFF

A Y

Page 15: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 15

CMOS Inverter

A Y

0 1

1 0

VDD

A=0 Y=1

GND

OFF

ON

A Y

Page 16: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

16

Example Gate: NAND

Page 17: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Example Gate: NOR

17

Page 18: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 18

Complex CMOS Gate

OUT = D + A • (B + C)

D

A

B C

D

A

B

C

Page 19: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Sources of Power Consumption

Power has three components Static power: when input isn’t

switching

Dynamic capacitive power: due to charging and discharging of load capacitance

Dynamic short-circuit power: direct current from VDD to Gnd when both transistors are on

Page 20: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Dynamic Power

° Dynamic power is required to charge and discharge load capacitances when transistors switch.

° One cycle involves a rising and falling output.• On rising output, charge Q = CVDD is required

• On falling output, charge is dumped to GND

Cfsw

iDD(t)

VDD1. Short circuit current

2. Charge/discharge current

Page 21: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Dynamic Power

Cfsw

iDD(t)

VDD

dynamic

0

0

sw

2sw

1( )

( )

T

DD DD

TDD

DD

DDDD

DD

P i t V dtT

Vi t dt

T

VTf CV

T

CV f

Short circuit power <10% of dynamic power

Page 22: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Lowering Dynamic Power

Pdyn = CL VDD2 f

Capacitance:Function of fan-out, wire length, transistor sizes

Supply Voltage:Has been dropping with successive generations

Clock frequency:Increasing…

Page 23: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Static Power Consumption

Static power consumption: Static current: in CMOS there is no static current there is no static current as long as Vin <

VTN or Vin > VDD+VTP

Leakage currentLeakage current: determined by “off” transistor Influenced by transistor widthtransistor width, supply voltagesupply voltage, transistor

threshold voltages

VDD

VI<VTN

Ileak,n

Vcc VDD

Ileak,p

Vo(low)

VDD

Page 24: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

° Junction leakage

° Gate oxide leakage

° Subthreshold leakage

Static Power Consumption

Page 25: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Implementation Choices (target technology)Implementation Choices (target technology)

CustomCustom

Standard CellsStandard CellsMaMacro Cellscro Cells

Cell-basedCell-based

Pre-diffusedPre-diffused(Gate Arrays)(Gate Arrays)

Pre-wiredPre-wired(FPGAs, PLDs)(FPGAs, PLDs)

Array-basedArray-based

SemicustomSemicustom

Digital Circuit Implementation ApproachesDigital Circuit Implementation Approaches

Page 26: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 26

Design Style I: Full Custom

IN Out

Vdd

Gnd

Designer Specifies the layout of each individual transistor and connection

Page 27: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Design StylesDesign Styles

• Full CustomFull Custom– Utilized for large production volume chips such as Utilized for large production volume chips such as

microprocessors.microprocessors.– No restriction on the placement of functional blocks No restriction on the placement of functional blocks

and their interconnections.and their interconnections.– Highly optimized, but labour intensive.Highly optimized, but labour intensive.

Page 28: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 28

Design Style II: Gate-array

Oxide isolation

Gate isolation

PMOS

NMOS

Vdd

Gnd

BA

Out

Vdd

Gnd

A

B

Out

Sea of gates Channel based

NAND gate using gate isolation

Can in principle be used by adjacent cell

• Array of prefabricated gates/transistors

Page 29: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Design StyleDesign Style

• Gate ArraysGate Arrays– Pre-fabricated array of gates (could be NAND).Pre-fabricated array of gates (could be NAND).– Design is mapped onto the gates, and the Design is mapped onto the gates, and the

interconnections are routed.interconnections are routed.

Page 30: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 30

Design Style III: Standard cells

Routing

Cell

IO cell

Page 31: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence)

Page 32: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Design StylesDesign Styles

• Standard CellStandard Cell– Utilized for smaller production ASICs that are Utilized for smaller production ASICs that are

generated by synthesis tools.generated by synthesis tools.– Layout arranged in row of cells that perform Layout arranged in row of cells that perform

computation.computation.– Routing done on “channels” between the rows.Routing done on “channels” between the rows.

Page 33: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Field Programmable Gate Array (FPGA)Field Programmable Gate Array (FPGA)

• Field programmable gate Field programmable gate arraysarrays– Pre-fabricated array of Pre-fabricated array of

programmable logic programmable logic and interconnections.and interconnections.

– No fabrication step No fabrication step required.required.

Page 34: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Design Style ComparisonsDesign Style Comparisons

STYLESTYLE

Full Full CustomCustom

Standard Standard CellCell

Gate Gate ArrayArray

FPGAFPGA

Cell sizeCell size VariableVariable Fixed Fixed heightheight

FixedFixed FixedFixed

Cell typeCell type VariableVariable VariableVariable FixedFixed Prog.Prog.

Cell placementCell placement VariableVariable In rowIn row FixedFixed FixedFixed

InterconnectionsInterconnections VariableVariable VariableVariable VariableVariable Prog.Prog.

Design costDesign cost HighHigh MediumMedium MediumMedium LowLow

Page 35: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 35

Dual port RAM

Full custom

Standard cell

ASIC with mixture of full custom, RAM and standard cells

FIFO

Single port RAM

ASIC (Application Specific Integrated Circuit)

is an IC customized for a particular use, rather than intended for general purpose use, eg., phones

Page 36: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Technology Scaling

Page 37: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 37

Technology Scaling

Technology scaling has a threefold objective: IncreaseIncrease the transistor density ReduceReduce the gate delay Stabilize Dynamic Power Consumption

Main challenges faced are static power static power consumption, delivery and density which determine the performance of the chip.

Finding solutions to these challenges makes technology scaling an important issue to academia and the industry.

Page 38: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 38

VLSI Trends: Moore’s Law

In 1965, Gordon Moore predicted that transistors would continue to shrink, allowing: Doubled transistor density every 18-24 months Doubled performance every 18-24 months

History has proven Moore right But, is the end in sight?

Physical limitations Economic limitations

Gordon MooreIntel Co-Founder and Chairman Emeritus

Image source: Intel Corporation www.intel.com

Page 39: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 39

Amazingly visionary – million transistor/chip barrier was crossed in the 1980’s. 2300 transistors, 1 MHz clock (Intel 4004) -

1971 16 Million transistors (Ultra Sparc III) 42 Million transistors, 2 GHz clock (Intel

Xeon) – 2001 55 Million transistors, 3 GHz, 130nm

technology, 250mm2 die (Intel Pentium 4) - 2004

140 Million transistor (HP PA-8500) 1 Billion transistor (SoC)

Technology Scaling: Moore’s LawTechnology Scaling: Moore’s Law

Page 40: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 40

Moore’s law in Microprocessors

40048008

80808085 8086

286386

486Pentium® proc

P6

0.001

0.01

0.1

1

10

100

1000

1970 1980 1990 2000 2010Year

Tra

nsi

sto

rs (

MT

)

2X growth in 1.96 years!

Transistors on Lead Microprocessors double every 2 yearsTransistors on Lead Microprocessors double every 2 years

Page 41: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 41

Frequency

P6Pentium ® proc

486386

28680868085

8080

80084004

0.1

1

10

100

1000

10000

1970 1980 1990 2000 2010Year

Fre

qu

ency

(M

hz)

Lead Microprocessors frequency doubles every 2 yearsLead Microprocessors frequency doubles every 2 years

Doubles every2 years

Page 42: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 42

Power Dissipation!

P6Pentium ® proc

486

3862868086

80858080

80084004

0.1

1

10

100

1971 1974 1978 1985 1992 2000Year

Po

wer

(W

atts

)

Lead Microprocessors power continues to increaseLead Microprocessors power continues to increase

Page 43: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 43

Leakage or `Static’ Power

Leakage power becomes a substantial portion of total power.

Trend projects that the leakage power will account for up to 50% of total power.

Not a practical limit of power dissipation.

Page 44: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 44

Power Dissipation Projection

Considering only active power, power dissipation increases from 100W in 1999 to 2000W in 2010.

Dotted line indicates the power dissipation including the leakage power.

Page 45: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 45

Interconnect Scaling

Interconnect scaling: Higher densities are only possible if the

interconnects also scale. Reduced width increased resistance Denser interconnects higher capacitance To account for increased parasitics and integration

complexity more interconnection layers are added:

thinner and tighter layers local interconnections

thicker and sparser layers global interconnections and power

Higher resistance and capacitance leads to higher RC delay.

Page 46: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 46

Technology Trends: Interconnect Delay

2.0 µ 1.5 µ 1.0 µ 0.8 µ 0.5 µ 0.35 µ

0.1

1.0

10

Dela

y (

ns)

Minimum Feature Size

TypicalGate Delay

InterconnectDelay

Interconnect delayInterconnect delay has become performance limiter as technology shrinks into sub micron regime.

Page 47: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 47

Technology Trend and Challenges

Source:ITRS’03

Interconnect determines the overall performance In addition: noise, power => Design closure Furthermore: manufacturability => Manufacturing closure

Page 48: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Application-Specific Integrated Circuits

ASIC advantages Best performance (speed) Smallest chip (area) Best power consumption

ASIC limitations? Inflexible (fixed!) Expensive to design High fixed costs require large

production runs Requires skillful designers

How to cope with complexity & cost?

Page 49: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 49

Intellectual Property (IP)-based Design

Page 50: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 50

System on Chip (SoC)

Increasing complexityMPSoC, NoC

Assembly of “prefabricated component”

Maximize VC(IP) reuse: over 90%

New economics: fast and correct design > optimum design

Design and Verification at the system level

interface between VCs SW becomes more important

up memory

video unitgraphics

coms DSP custom

software

up

Page 51: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 51

Principle

In 1945, the mathematician Von Neumann (VN)demonstrated in study of computation that acomputer could have a simple structure, capable of executing any kind of program, given a properly programmed control unit, without the need of hardware modification

The Von Neumann Computer

ENIAC - The first electronic computer (1946)

Page 52: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 52

Structure A memory for storing program and data.

The memory consists of the word with the same length A control unit (control path) featuring a program counter for

controlling program execution An arithmetic and logic unit (ALU) also called data path for

program execution

Datapath

Controllpath

Processor orCentral processing unit

Dataand

Instructions

Addressregister

Memory

Instructionregister PC

Data

Address

Registers

The Von Neumann Computer

Page 53: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 53

The Von Neumann Computer

CodingA program is coded as a set of instructions to besequentially executed

Program execution Instruction Fetch (IF): The next instruction to be

executed is fetched from the memory Decode (D): Instruction is decoded (operation?) Read operand (R): Operands read from the memory Execute (EX): Operation is executed on the ALU Write result (W): Results written back to the memory Instruction execution in Cycle (IF, D, R, EX, W)

What is the problem with this computing paradigm?

Page 54: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Bottlenecks in VN Architecutre

Page 55: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 55

Processor-Memory Performance GapProcessor-Memory Performance Gap

1

10

100

1000

10000

Year

Per

form

ance

“Moore’s Law”

µProc55%/year(2X/1.5yr)

DRAM7%/year(2X/10yrs)

Processor-MemoryPerformance Gap(grows 50%/year)

Page 56: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 56

The Von Neumann Computer

Advantage: Simplicity. Flexibility: any well coded program can be executed

Drawbacks: Speed efficiency: Not efficient, due to the sequential

program execution (temporal resource sharing). Resource efficiency: Only one part of the hardware

resources is required for the execution of an instruction. The rest remains idle.

Memory access: Memories are about 5 times slower than the processor

How to compensate for deficiencies?

Page 57: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 57

Improving Performance of VN (GPPs)Improving Performance of VN (GPPs)1. Technology Scaling

Improve performance (increase clock frequency!)

2. Improving Instruction Set of Processor3. Application Specific Processors (DSP)4. Use of Hierarchical Memory System

Cache can enhance speed

5. Multiplicity of Functional Units (H/W) Adders/Multipliers/Dividers (CDC-6600)

6. Pipelining within CPU (H/W) A four stage pipeline stage (IF/ID/OF/EX)

7. Overlap CPU & I/O Operations (H/W) DMA (Direct Memory Access) can be used to enhance performance

8. Time Sharing (SW) Multi-tasking assigns fixed or variable time slices to multiple programs

9. Parallelism & Multithreading (S/W) (H/W) Compilers/Multi-core systems

Page 58: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 58

Technology Scaling

• Technology scaling tends to increase transistor density and enhance performance.

• Scaling is essential to ensure sustained growth in the IC industry to meet future needs.

• However, main challengeschallenges faced are:

– PowerPower consumption,

– Manufacturing issues, i.e., yieldyield

– New Cad tools Cad tools that can support newer technologies

Page 59: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 59

Instruction Set of Processor Instruction Set of Processor

CPU execution timeCPU execution time (CPU time) – time the CPU spends working on a task

Does not include time waiting for I/O or running other programs

CPU execution time # CPU clock cycles for a program for a program = x clock cycle

time

CPU execution time # CPU clock cycles for a program for a program clock rate = -------------------------------------------

Can improve performance by: reducing either the length of the clock cycle or the number of

clock cycles required for a program

or

Page 60: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 60

Example: Improving PerformanceExample: Improving Performance

Our favorite program runs in 10 seconds on computer A10 seconds on computer A, which has a 4 GHz clock. We are trying to help a computer designer build a computer, B, that will run this program in 6 B, that will run this program in 6 secondsseconds. The designer has determined that a substantial increase in the clock rate is possible, but this increase will affect the rest of the CPU design, causing computer B to require 1.2 times as many clock cycles as computer A for this program. What clock rate should we tell the designer to target?What clock rate should we tell the designer to target?

CPU timeA = CPU clock cyclesA / (Clock rate)A

10 seconds = CPU (clock cycles) A / 4 x 109 cycles/second

CPU (clock cycles)A = 10 seconds x 4 x 109 cycles/sec = 40 x 109 cycles

CPU timeB = 1.2 x CPU clock cyclesA / (Clock rate)B

6 seconds = 1.2 x 40 x 109 (clock cycles)A / (Clock rate) B

(Clock rate)B = 1.2 x 40 x 109 cycles / 6 seconds = 8 GHz

Page 61: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 61

Clock Cycles per InstructionClock Cycles per Instruction

Not all instructions take the same amount of time to execute

One way to think about execution time is that it equals the number of instructions executed multiplied by the average time per instruction

Clock cycles per instruction (CPI) – the average number of clock cycles each instruction takes to execute

A way to compare two different implementations of the same ISA

# CPU clock cycles # Instructions Average clock cycles for a program for a program per instruction = x

CPI for this instruction class

A B C

CPI 1 2 3

Page 62: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

62

Determinates of CPU PerformanceDeterminates of CPU Performance

CPU time = Instruction_count x CPI x clock_cycle

Instruction_count

CPI clock_cycle

Algorithm

Programming language

Compiler

ISA

Processor organization

TechnologyX

XX

XX

X X

X

X

X

X

X

Page 63: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 63

Application Specific Processors

#1: CPU designed CPU designed for efficient DSP processing– Multiple MAC unitsMultiple MAC units, 2 Accumulators, Additional

Adder, Barrel Shifter

#2: Multiple busses Multiple busses for efficient data

and program flow– Four busses and large on-chip memory that

result in sustained performance near peak

#3: Highly tuned instruction set instruction set for powerful DSP computing

– Sophisticated instructions that execute in fewer fewer cyclescycles, with less code and low power demands

Page 64: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 64

Memory Hierarchy: The Principle of Locality

The Principle of Locality: Program access a relatively small portion of the address space

at any instant of time.

Two Different Types of Locality: Temporal Locality Temporal Locality (Locality in Time):(Locality in Time): If an item is referenced, it

will tend to be referenced again soon (e.g., loopsloops, reuse) Spatial Locality Spatial Locality (Locality in Space):(Locality in Space): If an item is referenced,

items whose addresses are close by tend to be referenced soon (e.g., straight line code, array accessarray access)

Take advantage of locality. How? Use memory hierarchy.

Page 65: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 65

Levels of the Memory Hierarchy (Speed vs. Cost)

CPU Registers100s Bytes<10s ns<10s ns

CacheK Bytes10-100 ns10-100 ns1-0.1 cents/bit

Main MemoryM Bytes200ns- 500ns200ns- 500ns$.0001-.00001 cents /bit

DiskG Bytes, 10 ms (10,000,000 ns)

10 - 10 cents/bit-5 -6

CapacityAccess TimeCost

Tapeinfinitesec-min10 -8

Registers

Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

StagingXfer Unit

prog./compiler1-8 bytes

cache cntl8-128 bytes

OS512-4K bytes

user/operatorMbytes

Upper Level

Lower Level

faster

Larger

Page 66: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 66

Harvard ArchitectureHarvard Architecture

CPU

PCdata memory

program memory

address

data

address

data

The Harvard architecture uses a different approach than the Von Neumann architecture where the program memory program memory and the data memory are not shared data memory are not shared (on different busses)

This allows the instructions to be fetched and executed concurrentlyfetched and executed concurrently with data. The Harvard architecture allows for a cleaner pipelining of instructions since there is no

contention in fetching data vs. instructions.

Page 67: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 67

Exploiting ParallelismExploiting Parallelism

• Bit level Bit level parallelism: 1970 to ~1985– 4 bits, 8 bit, 16 bit, 32 bit microprocessors

• Instruction level Instruction level parallelism (ILP): ~1985 through today

– Pipelining (Enhance Throughput)

– Superscalar

– Limits to benefits of ILP?

• Process Level Process Level or Thread level parallelism; mainstream for general purpose computing?

– Servers (Multi-Processing Systems)

– High-end Desktop dual processor (Multi-Core)

Page 68: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 68

PipeliningPipelining

Exploits parallelism at the instruction levelinstruction level.Pipelining is an implementation technique in

which multiple instructions are overlapped in execution.Today pipelining is keypipelining is key to making processors fast.Pipelining is not only used in General Purpose

processors but can also be used in hardware hardware acceleratorsaccelerators (Reconfigurable Computing Systems).

Page 69: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Pipelining StrategyPipelining Strategy

Page 70: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 70

Page 71: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 71

Page 72: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 72

Page 73: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 73

Page 74: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 74

Page 75: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 75

Page 76: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 76

Page 77: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 77

Page 78: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 78

Page 79: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 79

Page 80: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 80

Speed Up Speed Up

stages pipe ofNumber

nsinstructiobetween Time nsinstructiobetween Time

ned)(nonpipeli

)(pipelined

If the stages are perfectly balanced, then the time If the stages are perfectly balanced, then the time between instructions on the pipelined processor – between instructions on the pipelined processor – assuming ideal conditions – is equal to:assuming ideal conditions – is equal to:

Under ideal conditionsUnder ideal conditions and with a large number of and with a large number of instructions, the speedup from pipelining is instructions, the speedup from pipelining is approximately equal to the number of pipe stage, i.e., a approximately equal to the number of pipe stage, i.e., a five stage five stage pipeline is nearly pipeline is nearly five times fasterfive times faster..

Pipelining Pipelining improves performance by:improves performance by: increasing instruction throughputincreasing instruction throughput, , The execution time of an individual instruction The execution time of an individual instruction

remains the same (i.e., latency is same).remains the same (i.e., latency is same).

Page 81: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 81

Page 82: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 82

Page 83: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 83

Page 84: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 84

Page 85: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 85

Page 86: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 86

Page 87: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 87

Page 88: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

88

Example: Conventional Data Path Timing

The figure shows the maximum delay values for each of the components of a typical data path.

1. 4 ns (3ns + 1ns) to read two operands from register file.

2. 4ns to perform an operation.

3. 4ns to write info back

Total 12 ns 12 ns to perform a single micro operation.

The rate of execution is then set at 1/12ns = 83.3MHz83.3MHz

Can we make it faster?

Page 89: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

89

Example: Pipelined Data Path Timing

We can break the delay of 12ns by inserting registers between the different components of the system.

A register is inserted between the function unit and the register file (OF)

Another register can be inserted between the function unit and MUX D. (EX + WB)

3 stage pipeline: OF / EX / WB

The maximum delay now is 5ns 5ns allowing a maximum clock frequency of 200 MHz200 MHz

Page 90: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 90

Page 91: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 91

Its Not That Easy for ComputersIts Not That Easy for Computers

• Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle

– Structural hazards: HW cannot support this combination of instructions - two dogs fighting for the same bonetwo dogs fighting for the same bone

» In a computer system: instead of having two memories we have a single memory and we need to write a value and also read a new instruction.

– Data hazards: Instruction depends on result of prior instruction still in the pipeline

» Example:• Add $s0, $t0, $t1• Sub $t2, $s0, $t3

– Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow:

» (branches and» jumps).

Page 92: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

+Resource Resource HazardsHazards

A resource hazard occurs when A resource hazard occurs when two or more instructions that two or more instructions that are already in the pipeline are already in the pipeline need the same resourceneed the same resource

The result is that the instructions The result is that the instructions must be executed in serial rather must be executed in serial rather than parallel for a portion of the than parallel for a portion of the pipelinepipeline

A resource hazard is sometimes A resource hazard is sometimes referred to as a referred to as a structural hazard

Page 93: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 93

Page 94: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 94

Page 95: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 95

Page 96: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 96

Page 97: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 97

Page 98: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 98

Page 99: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 99

Page 100: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 100

Parallel Processing

Using more than one processor to solve a problem Idea is that n processorsn processors operating simultaneously can

achieve the result n times faster.

Motives Diminishing returns from ILP

- Limited ILP in programs

- ILP increasingly expensive to exploit

Fault tolerance Availability of Multiple Threads/Processes (independent).

Page 101: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 101

Flynn’s Taxonomy of MultiprocessingFlynn’s Taxonomy of Multiprocessing

Single-instructionSingle-instruction single-data stream (SISD) machines

Single-instructionSingle-instruction multiple-data stream (SIMD) machines

Multiple-instruction Multiple-instruction single-data stream (MISD) machines

Multiple-instruction Multiple-instruction multiple-data stream (MIMD) machines

Instructions

Single (SI) Multiple (MI)D

ata

Mu

ltip

le (

MD

)SISD

Single-threaded process

MISD

Pipeline architecture

SIMD

Vector Processing

MIMD

Multi-threaded Programming

Sin

gle

(S

D)

Page 102: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 102

Single Instruction Stream Single Data Single Instruction Stream Single Data Stream (SISD)Stream (SISD)

In a single processor computer, a single stream of instructions isgenerated by the program. The instructions operate on a singlestream of data items (Traditional Von Neumann ArchitectureTraditional Von Neumann Architecture)

ProcessorControl MemoryInstructionstream

Data stream

Algorithms for SISD computers do not contain any parallelismdo not contain any parallelism

Page 103: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 103

Single Instruction Stream Multiple Single Instruction Stream Multiple Data Stream (SIMD)Data Stream (SIMD)

•A specially designed computer in which a single instruction streamsingle instruction stream is from a single program, but multiple data streamsmultiple data streams exist. •The Instructions from program are broadcast to more than one Processor. •Each processor executes the same instruction in synchronism, but using different data.

Vector computersVector computers

Page 104: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 104

P1 PNP2

Control

Shared memory or interconnection memory

SIMD ArchitectureSIMD Architecture

Instruction stream

Data streams

The processors operate synchronously and a global clock is usedto ensure lockstep operation

Page 105: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 105

SIMD Application: ExampleSIMD Application: Example

Add two matrices C = A + B

Say we have two matrices A and B of order 2 and we have 4 processors, ie we wish to calculate:

CC1111 = A = A1111 + B + B1111 CC1212 = A = A1212 + B + B1212

CC21 21 = A= A2121 + B + B2121 C C2222 = A = A2222 + B + B2222

The same instruction same instruction (add the two numbers) is sent toeach processor, but each processor receives different data

Page 106: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 106

Multiple Instruction Stream Single Data Multiple Instruction Stream Single Data Stream (MISD)Stream (MISD)

• A computer with multiple processors each sharing a commonmemory. • There are multiple streams multiple streams of instructions and one stream one stream of

data.

Processor

Processor

Memory

Control

Control

Example?Example?

Page 107: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 107

MISD exampleMISD example

Check whether a number Z is prime

• Each processor is assigned a set of test divisors in its instruction stream• Each processor, takes Z as input and tries to divide it by its divisors

MISD is awkward to implement and such machines are just experimental

No commercial MISD machine exists

Page 108: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 108

Multiple Instruction Stream Multiple Data Multiple Instruction Stream Multiple Data Stream (MIMD)Stream (MIMD)

• General purpose multiprocessor system – • Each processor has a separate program and one instruction stream is generated from each program for each processor. • Each instruction stream operates upon different data.

The most general and most useful most general and most useful of our classifications.

Page 109: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 109

P1 PNP2

C1 C2 CN

Shared memory or interconnection network

Processors

Controls

Each processor operates under the control of an instructionstream issued by its own control unit.

Processors operate asynchronously in general

MIMD architectureMIMD architecture

Page 110: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 110

Parallel vs. Distributed

SharedMemory

Parallel: Multiple CPUs within a shared memory machine

Distributed: Multiple machines with own memory connected over a network

Ne

two

rk c

on

ne

ctio

nfo

r d

ata

tra

nsf

er

D D D D D D D

Processor

Instructions

D D D D D D D

Processor

Instructions

Page 111: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 111

MIMD: Parallel Programming ModelsMIMD: Parallel Programming Models

• Distributed Memory– Explicit communicationExplicit communication

» Send messages

» Send (tid, tag, message)

» Receive (tid, tag, message)

– SynchronizationSynchronization

» Block on messages (implicit sync)

• Shared Memory – Implicit communicationImplicit communication

» Using shared address space

» Loads and stores

– SynchronizationSynchronization

» Atomic memory operators

» Semaphores

Scalability?Scalability?

Page 112: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 112

Multi-Processing Issues

• How to assign tasks to processors?

• What if we have more tasks than processors?

• What if processors need to share partial results?

• How do we aggregate partial results?

• How do we know all processors have finished?

• What if processors die?

Page 113: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 113

Speedup factor

S(n) = Execution time on a single processor

Execution time on a multiprocessor with n processors

S(n) gives increase in speed by using a multiprocessor

Speedup factor can also be cast in terms of computational steps

S(n) = Number of steps using one processor

Number of parallel steps using n processors

Maximum speedup is n with n processors (linear speedup) - this theoretical limit is not always achieved. WHY?WHY?

Page 114: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 114

Amdahl’s Law• Amdahl's law, is used to find the maximum expected improvement

to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical predict the theoretical maximum maximum speedupspeedup using `n’ processors.

• The speedupspeedup of a program using multiple processors in parallel computing is limited is limited by the time needed for the sequential fraction of the program.

Page 115: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 115

Limitations?Limitations?

Amdahl’s Law

T = 1

(1 – a) + a / s

T = Overall speedup

a = Fraction of the original program that could be enhanced by executing in parallel or transferring it to hardware.

s = Expected speedup obtained (from hardware) for particular fraction of program

Page 116: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 116

Amdahl’s Law: Example• Assume you profiled an application and noticed that

12% of the application can be parallelized and 88% of the operations are not parallelizable.

• Question: What is the maximum speedup of the parallelized version?

• Answer: Amdahl’s law states that the maximum speedup of the parallelized version is:

1/(1 – 0.12) = 1.136 times as fast as the non-parallelized implementation.

Page 117: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 117

AmdahlAmdahl’’s Law: s Law: ExampleExample

Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

= 1

0.95= 1.053

Law of diminishing return:Law of diminishing return:Focus on the common case!Focus on the common case!

Speedupoverall = 1

(1-0.1) + 0.1/2=

Page 118: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 118

Spatial vs. Temporal Computing

Ax2 + Bx + c (Ax + B)x + C

Spatial (ASIC) Temporal (Processor)

Page 119: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Temporal vs. spatial based computing

Temporal-based execution(software)

Spatial-based execution(reconfigurable computing)

Ability to extract parallelism (or concurrency) from algorithm descriptions is the key to acceleration using reconfigurable computingacceleration using reconfigurable computing

Page 120: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Methods for executing algorithms

Advantages:•very high performance and efficientDisadvantages:•not flexible (can’t be altered after fabrication)• expensive

Hardware(Application Specific Integrated Circuits)

Software-programmed processors

Advantages:•software is very flexible to changeDisadvantages:•performance can suffer if clock is not fast•fixed instruction set by hardware

Reconfigurablecomputing

Advantages:•fills the gap between hardware and software •much higher performance than software•higher level of flexibility than hardware

Page 121: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 121

Reconfigurable Computing

The Ideal device should combine: the flexibility of the Von Neumann computer the efficiency of ASICs

The ideal device should be able to Optimally implement an application at a given time Re-adapt to allow the optimal implementation of a new

application. We call such a device a reconfigurable device.

Definition: Reconfigurable computing can be defined as the study of computations involving reconfigurable devices. This includes,

1. Architecture, 2. Algorithms, 3. Applications.

Page 122: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 122

How to fill the gap?

Fle

xib

ilit

y

Performance

ASIC

GPs

DSP

RCS

Page 123: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 123

Reconfigurable Hardware (FPGAs)

KEY ADVANTAGE: Performance of KEY ADVANTAGE: Performance of Hardware, Flexibility of SoftwareHardware, Flexibility of Software

CLB Block RAM IP Core (Multiplier)

Page 124: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

Why reconfigurable computing is more relevant these days?

• Demand for high-performance computation is soaring: – large-scale optimization problems, physics and earth simulation,

bioinformatics, signal processing (e.g. HDTV), …, etc)• Why software-programmed processors are no longer attractive?

– Faster temporal execution of instructions is no longer improving– General-purpose multi-core processors requires coarse grain

thread-level parallelism• Why reconfigurable fabrics are currently attractive?

– Increased integration densities allow large FPGAs that can implement substantial functions

– Provide the spatial computational resources required to implement massively-parallel computations directly in hardware

Page 125: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 125

Reconfigurable Devices

Page 126: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 126

1. Rapid prototyping

Testing hardware in real conditions before fabrication

Software simulation Relatively inexpensive Slow Accuracy ?

Hardware emulationHardware testing under real

operation conditionsFastAccurateAllow several iterations

APTIX System Explorer

ITALTEL FLEXBENCH

Page 127: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 127

2. Non-Frequent Reconfiguration

Page 128: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 128

3. Frequently Reconfigured

Computing systems that are able to adapt their behaviour and structure to changing operating and environmental conditions, time-varying optimization objectives, and physical constraints like changing protocols, new standards, or dynamically changing operation conditions of technical systems

Page 129: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 129

Static & Dynamic Reconfiguration

Page 130: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 130

Benefits of RCS Non-permanent customization and application

development after fabrication “Late Binding”

Achieve high performance that require real time Time-to-market (evolving requirements and

standards, new ideas)

Disadvantages

• Efficiency penalty (area, performance, power) as compared to ASIC

• Ease of use as compared to GPP

Conclusions Conclusions

Page 131: ENG6530 Reconfigurable Computing Systems Introduction to Reconfigurable Computing Introduction to Reconfigurable Computing.

ENG6530 RCS 131


Recommended