+ All Categories
Home > Documents > Hardware/Software Codesign of Embedded Systems

Hardware/Software Codesign of Embedded Systems

Date post: 26-Jan-2016
Category:
Upload: daktari
View: 52 times
Download: 3 times
Share this document with a friend
Description:
Hardware/Software Codesign of Embedded Systems. Reconfigurable Computing. Voicu Groza SITE Hall, Room 5017 562 5800 ext. 2159 [email protected]. Outline. Introduction Enabling Technologies Fix, configurable, reconfigurable ... Reconfigurable Architectures - PowerPoint PPT Presentation
47
Voicu Groza, 2008 SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS 1 Hardware/ Software Codesign of Embedded Systems Reconfigurable Reconfigurable Computing Computing Voicu Groza SITE Hall, Room 5017 562 5800 ext. 2159 [email protected]. ca
Transcript
Page 1: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

1

Hardware/Software Codesign of

Embedded Systems

ReconfigurableReconfigurable ComputingComputing

Voicu Groza SITE Hall, Room 5017

562 5800 ext. [email protected]

Page 2: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

2

Outline

1. Introduction2. Enabling Technologies3. Fix, configurable, reconfigurable ...4. Reconfigurable Architectures5. Run-Time-Reconfigurable System-on-

Chip6. Conclusion and Future Work7. References

Page 3: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

3

1. Introduction• Reconfigurable computing – Definition

• Why reconfigurable computing ?

Page 4: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

4

Reconfigurable Computing - Definition

• Reconfigurable Computing (RC) = presence of hardware (HW) that can be reconfigured (reconfigware - RW)

• 1960: Gerald Estrin, “The UCLA Fixed-Plus-Variable (F+V) Structure Computer”

• DeHon and Wawrzynek: “computing via a postfabrication and spatially programmed connection of processing elements.”– The architecture used in the computation is determined postfabrication

and can therefore adapt to the characteristics of the executed algorithms.

– The computation is spatial, in contrast to the more temporal style associated with microprocessors.

Page 5: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

5

Re-inventing the wheel...

wire your own computer

Page 6: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

6

Why reconfigurable computing ?• Is your belt long enough?

• Embedded hand-held devices need to reduce– the power consumption targets, – the acceptable packaging and manufacturing costs,– the time-to-market

• High-performance computing• Today’s computationally intensive applications require

more processing power: – streaming video, – image recognition and processing, – highly interactive services– telecommunications– genes

• Cray revived its latest entry-level XD1 supercomputer by combining AMD Opteron processors with FPGAs for compute acceleration in a Linux environment.

Page 7: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

7

Why reconfigurable computing … cont.PRO CON

High-performance micro-processors

Versatile SW Off the-shelf solution

For some applications: might not be fast enough power consumption

(>100W/gigaFLOP) cost (+k$s)

Reconfigurable Computing Systems

Versatile SW & HW Computing structure matches

application Given fabric can implement

numerous functional units. Built out of off-the-shelf

components, reduce design-time

wires are slow & big bit-slices are costly to

interconnect -> large silicon area & performance overhead

devices must store configuration on the chip

Application-Specific Integrated Circuits (ASIC)

Does not suffer from the serial (and often slow and power-hungry) instruction fetch, decode and execute cycle that is at the heart of all microprocessors.

Consumes less power

fixed structure the cost of producing an

ASIC (the mask’s cost = 1 M$ ),

the time to develop a custom integrated circuit

Page 8: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

8

• Programmable ICs: CPLD and FPGA (Xilinx 1984)• HW Abstractions

– Fine-grained Reconfiguration is at the gate and register level.

• By reconfiguration of registers, gates, and their interconnections, the internal structure of functional units is changed.

• 2 major technologies:– Complex Programmable Logic Devices (CPLD) – EEPROM based– Field-Programmable Gate Arrays (FPGA) – SRAM based

– Coarse-grained Reconfiguration is based on a set of fixed blocks, like functional units, processor cores, and memory tiles.

• The reconfiguration is merely the reprogramming of the interconnections between the fixed blocks.

2. Enabling Technologies

Page 9: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

9

Complex Programmable Logic Devices (CPLD)

• Supplied with no predetermined logic function.

• Programmed by user to implement any digital logic function.

• Requires specialized computer software for design and programming.

• Complex PLD (CPLD) = A PLD that has several programmable sections with internal interconnections between the sections.

• The basic building block of a CPLD is a macrocell which implements a logic function that is synthesized into a sum of product equations, followed by a D-type register.

• Macrocells are grouped into logic blocks which are connected via a centralized interconnect array.

Page 10: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

10

Altera MAX 7000 macrocell

Page 11: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

11

Field-Programmable Gate Array (FPGA)

Universal gates

and/or

storage elements

Interconnectionnetwork

Switches

• Reconfigurable functional units – coarse grained - ALUs and storage– fine-grained - small lookup tables

Page 12: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

12

Basic ingredient: Look Up Table (LUT)

0001

a0

a1

a0

a1

data

a1 & a2 Memory elements: SRAM

Logic Cell

Universal gate =

= Look-up table = memory

Page 13: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

13

Configurable Logic Blocks (CLB - Xilinx)Logic Array Block (LAB – Altera)

XIL

INX

Spa

rtan

II

CLB

2 logic cells =1 slice (Xilinx) or= 1 Adaptive Logic Module (ALM - Altera)

2 slices = HW abstractions Configurable Logic Blocks (CLB - Xilinx)

Page 14: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

14

Xilinx - Spartan II Architecture• IOBs provide the interface

between the package pins and the internal logic

• CLBs provide the functional elements for constructing most logic

• Dedicated block RAM memories (4096 bits each)

• Clock DLLs for clock distribution delay compensation and clock domain control

• Versatile multi-level interconnect structure

Page 15: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

15

Xilinx Virtex FPGA Model Logic block

SwitchMatrix

IO MuxCLB

Line Segments

ProgrammableInterconnect Point

(PIP)

SRAM

Buffer

Switch Matrix

Page 16: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

16

Virtex-II Architecture Overview

DCM = Digital Clock ManagerBlock SelRAM =18 Kbit (2k x 9bit of dual-port RAM)Multiplier blocks 18-bit x 18-bit

1 CLB = 8 slices1 slice contains 2

function generators F & G which are configurable as

• 4-input look-up tables (LUTs), or

• 16-bit shift registers, or• 16-bit distributed

SelectRAM memory.

Device CLBsRow x Col

Logic Cells Slices

DistribRAM (Kb)

DSPBlockRAM (Kb)

SelRam

XC4VLX200 192 x 116 200,448 89,088 1392 96 336 6,048

Page 17: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

17

3. Fix, configurable, reconfigurable ...

• A simple classification:1. Non-configurable computing2. Configurable computing3. Reconfigurable computing

• Each has its own characteristics, (dis)advantages and applications

Page 18: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

18

3.1. Non-Configurable Computing

• Uses fixed hardware such as ASICs or Custom VLSI circuits (eg. Microprocessors like x86, Sparc, DEC, PowerPC, etc…)

• Long product turnaround time, usually around 3-6 months

• Optimized for performance• Can be quite costly• Hardwired thus no room for error, re-work, improvement

Execute

Page 19: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

19

3.2. Configurable Computing

• Configuring host supervises FPGA reconfiguration of a new bitstream

• A bitstream is a sequence of bits which represents the burn-in configuration of the Hardware Block (HB) eg. synthesized, place and routed design

011010010111011011100010011000111001110010100110001110011100101001100011100111001010011000111001110010

1110010001111111111111111110011000111100011111111101101001011101101110001001100011100000000011010101011110101011010111111111111

Configuring Host Bitstream

Execute

Page 20: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

20

3.2. Configurable Computing (Cont’d)Advantages:• Uses configurable hardware such as FPGA or CPLD• PLDs are soft wired for re-use of static hardware

resources• Cost effective• Quick turnaround time• Flexible and ease in design process

Disadvantages:• Inefficient use of hardware resources, cannot use idle

FPGA area during run-time• Slow reconfiguration time, because of reconfiguring

the entire FPGA for a single Hardware Block (HB)• Thus, must stop execution while reconfiguring a new

Hardware Block

Page 21: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

21

3.3. Reconfigurable Computing

011010010111011011100010011000111001110010100110001110011100101001100011100111001010011000111001110010110010

1110010001111111111111111110011000111100011111111101101001011101101110001001100011100

Configuring Host

1110010001111111111111111110011000111100011111111101101001011101101110001001100011100

Bitstream

ExecuteWe could also use a placement algorithm to possibly fit all requested HBs into the FPGA

Page 22: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

22

3. Reconfigurable Computing (Cont’d)Advantages:

– Same as Configurable Computing– No need to completely stop the execution while reconfiguring the

FPGA with a new HB– Efficient use of static hardware resources; can swap out or move

HBs around to fit new HBs on the FPGA, no need for a larger FPGA or a second one

– Fast reconfiguration times– Run-time reconfiguration on the fly– Less power consumption, as we can swap out HBs

Disadvantages:– Routing HBs can be a heavy overhead for the configuring host

especially if HBs are too large or when defragmentation is necessary

Page 23: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

23

What is Run-Time Reconfiguration (RTR) ?

On-the-fly flexibility

Combines characteristics of co-processors with those of reconfigurable computing

Introduces overhead to reconfigure the co-processor but offsets by increasing execution speed (faster in H/W!)

Page 24: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

24

4. Reconfigurable Architectures1. External stand-alone processing unit2. Attached processing unit3. Reconfigurable functional unit4. Co-processor5. Processor embedded in a reconfigurable fabric

(Compton & Hauck)

Page 25: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

25

External stand-alone processing unit

The RECON SystemJohn Reid HauserJohn Wawrzynek Randy H. Katz(University of California, Berkeley)

Consists of a SUN SparcStation host and a reconfigurable coprocessor board (The board exploits a XC4010 FPGA as the reconfigurable processor unit).

RPU coupled to the I/O system bus

Page 26: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

26

Attached processing unit

TKDMMarco PlatznerETH Zurich

• FPGA module that uses the DIMM (dual inline memory module) bus for high-bandwidth communication with the host CPU.

• It is integrated with the Linux host OS;

• offers functions for data communication and FPGA reconfiguration.

RPU coupled to the local bus

Page 27: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

27

Attached processing unit (Cont.)

MorphosysNader BagherzadehUniversity of California, Irvine

• Coarse grain: MorphoSys operates on 8 / 16-bit data.

• Configuration: RC array is configured by context words, which specify an instruction opcode for RC.

• Depth of programmability: The Context Memory can store up to 32 planes of configuration.

• Dynamic reconfiguration: Contexts are loaded into Context Memory without interrupting RC operation.

• Local/Host Processor: The control processor (Tiny RISC) and RC Array are resident on the same chip.

• Fast Memory Interface: Through DMA controller.

• Consists of a combination of a RISC processor core with an array of coarse-grain reconfigurable cells;

• It utilizes a DMA controller in order to load the configuration data (context) into the Context Memory

Page 28: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

28

Reconfigurable functional unit

Chimaera S. HauckUniversity Washington, SeatleSystem treats the reconfigurable logic as a cache for RPU instructions.

• Those instructions that have recently been executed, or that we can otherwise predict might be needed soon, are kept in the reconfigurable logic.

• If another instruction is required, it is brought into the RPU by overwriting one or more of the currently loaded instructions.

RPU integrated in the CPU

Chimaera

Page 29: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

29

Co-processor

GARPHauser & WawrzynekUniversity of California, Berkley

• A reconfigurable architecture that combines reconfigurable hardware with a standard MIPS processor on the same die to retain better feature performance.• Two configurations can never be active at the same time on its reconfigurable array which can significantly reduce the overall performance of the system.

RPU coupled to the CPU

Page 30: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

30

5. RTR-SoC System Architecture

RTR-SoC System Architecture

IBM OPB

Runs softwareinstructions

Execution unitof HBs

Stores HB bitstreams

Stores programand data code

Allows dedicated OMA-RPU access

Page 31: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

31

Application and Reconfiguration Flows

• While the application flow runs on AE, RE sends RTR_PREP_HB to the ICAP controller, to start the loading of the first HB bitstream onto the RPU.

• Once this HB is ready in the RPU, the ICAP sends back an RTR_ACK to the RE.• The newly implemented HB on the RPU starts to work as soon as it is ENABLEd by the reconfiguration

flow on RE. • Upon completion, HB sets flag RTR_DONE to make the application flow aware that it is ready for use. • Once the application flow on AE has prepared data that HB needs, AE asserts the flag DATA_READY. • HB asserts EXE_DONE when finishes its task and has prepared the results to be read by the application

flow on AE. • When the application flow needs these results, it checks the flag EXE_DONE, and waits if it is not yet set. • The application flow gets the results and then asserts DATA_ACK to acknowledge to HB that it got data.

Page 32: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

32

Final system architecture

RE

AE

Page 33: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

33

Tasks running on AE and RE

Page 34: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

34

Physical Layer Overview• Have already developed a physical layer in JBits in

order to evaluate RTR on a Xilinx Virtex device• Physical layer has 3 main functions

– modeling the FPGA resources, – running a placement algorithm for the different Hardware

Blocks, and – managing the physical resources of the FPGA and any on-

board peripherals.

JBits is a set of Java APIs and classes that provide a High-Level language approach to develop reconfigurable Systems, include RT reconfiguration.

RTR Execution Model Bitstream(s) read by the JBits App JBits App configures the Virtex RC HW located in the

PCI slot using the XHWIF API. XHWIF (Xilinx HardWare InterFace Standard)

Java interface for communicating with FPGA-

based boards.

This Enables run-time reconfiguration of Virtex Device.

Page 35: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

35

Hardware Block (HB) Architecture

Packer DispatcherCU

I-Buffer

O-Buffer

r/w

Mem ack

Mem req

PELM

PELM

PELM

PELM

PELM

PELM

PELM

PELM

PELM

.

.

.

.

.

.

.

.

.

Register Decoder

.

.

.

RS1n

RS10HB sel1

Data_ MAB

valid

Register Decoder

.

.

.RS20

HB sel2

reg sel2

I/F addr HB

addr MABr/w opb

data_ opb data HB

ss opb

RS2n

r/w hb

ss mc

reg sel1

HBDU

HBIU

done

• An HB is a functional hardware module that contains its own configuration (i.e. the bitstream), and state information (e.g. status and control registers) that define its current state.

• It is divided into two major components:–The HB Dependent Unit (HBDU) Encompasses several components that vary in functionality and magnitude depending on the functions supported by a particular HB.

–The HB Independent Unit (HBIU)Designed as a core and hence follows a standardized implementation scheme for all HBs.

Page 36: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

36

Hardware Block Reconfiguration• The HBs are partially reconfigured by the aforementioned

Reconfigurable Processing Unit (RPU).• The reconfiguration process is enabled by means of a Self-

Reconfiguration Platform (SRP). – It enables the FPGA to be dynamically reconfigured under

the control of an embedded microprocessor.– It is divided into a H/W component and S/W components.

ICAP

Control Logic

BRAM

FPGA Configuration

MemoryMicroBlaze

OPB Bus

• The H/W component consists of four primary components: the Internal Configuration Access Port (ICAP), some control logic, a small configuration cache - Block RAM (BRAM), and an embedded processor.

• The S/W component implements an API that defines methods for accessing configuration logic through the ICAP port.

Page 37: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

37

I/OB

Block R

AM

s

Multipliers 18 x 18

Block R

AM

s

Multipliers 18 x 18

Block R

AM

s

Multipliers 18 x 18

Block R

AM

s

C onfigurableLogic

Block

lock

• Each CLB contains four slices.

• Each slice contains two 4-input look-up tables, 2 D-type flip-flops to implement combinational and sequential circuits.

• Virtex II FPGAs fabric composed of an array of Configurable Logic Blocks (CLBs).

• Block RAMs (BRAM).• Input/Output Blocks (IOBs).• Special functions blocks such as

Multipliers, PLLs etc.

PR Methodology: Xilinx Virtex II Architecture

Page 38: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

38

PR Methodology

– Bus Macros (BMs) are required between active and static modules of the design.

– The size and location of the reconfigurable module (active) is always fixed.

– The reconfigurable module is always the full height of the device;

– All logic resources located within the width of the module are considered part of the reconfigurable module’s bitstream frame. This includes slices, tri-state buffers (TBUFs), block RAMs (BRAMs), multipliers, input/output blocks (IOBs), and all routing resources.

Page 39: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

39

Bus Macro block Diagram–Bus Macros (BMs) are predefined physical routing bridges that connect the active to the static one.–Any connection from active to static logic should always go through a bus macro –We chose the slices bus macros (over the TBUF) as they give higher concentration of communication bits per CLB –Bus macros allows data to move in only one direction either left-to-right or right-to-left.

PR Methodology

Page 40: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

40

Block RAMS

MicroBlaze A

MicroBlaze R

Act

ive B

lock

Bu

s M

acro

s

ICAP_VIRTEX2

Final Design Layout

Active Module Fixed Module

R2L

BUS

MACRO

L2R

BUS

MACRO

0

7

7

0

Design contains only one active module. All other logic components are on the static module.

PR Methodology

Page 41: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

41

Xilinx Internal Configuration Access Port (ICAP)

MicroBlaze

OP

B_

BU

S

OPB_HWICAP

ICAP

Control Logic

BRAM

– Provides configuration interface to FPGA fabric.

– Cache BRAM to hold at least one frame.

– Control logic for the OPB bus interface.

– API calls to allow SW to read/Write configuration memory.

PR Methodology

Page 42: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

42

PR Methodology• A partial bitstream is generated for the active (dynamic)

part of the FPGA • The device remains in full operation while the new partial

bitstream is downloaded • The full bitstream configuration must already be

programmed into the device before downloading the partial bitstream.

• Multiple bitstreams can be generated for every partially reconfigurable module variation

• Failing to utilize this command will assert the global set reset (GSR) during configuration, resetting the entire design – –g ActiveReconfig: Yes option

Page 43: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

43

PR Methodology– Virtex-II configuration

memory is arranged in vertical frames that are one bit wide and stretch from the top edge of the device to the bottom.

– These frames are the smallest addressable segments of the Virtex-II configuration memory space; therefore, all operations must act on whole configuration frames.

– The length of a Virtex-II frame is not fixed and depends on the size of the device.

– the number of frames per column type is constant for all devices.

Page 44: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

44

Reconfigurable Processing Unit

The RPU high-level block diagram

Page 45: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

45

Preliminary Results• Xilinx Virtex-II Platform FPGAs were used to implement

this system. • Preliminary results were generated using ModelSim SE

5.7f.

Simulation results for the HB I/F interface. They illustrate how the I/F is used in order to enable proper synchronization among the reconfiguration flow and the application flow.

Page 46: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

46

6. Conclusion and Future Work

• A novel architecture of a RTR SoC is introduced

• RPU and HBs are designed

• This design targets adaptive embedded systems, DSP-related and low-power applications

• These functions are implemented as HBs and can be exploited in a multi-purpose environment. For example, the RTR SoC may execute various tasks to perform DSP-related functions, and subsequently reconfigured into a high-performance measurement processing system

• Future designs would allow the user more flexibility by auto-reconfiguring the RPU depending on the computational and functional needs of its respective applications

• Real-time applications is our future target, as idle HBs are swapped out of the RPU, to save power or to allow for updates to the HBs

Page 47: Hardware/Software Codesign of Embedded Systems

Voicu Groza, 2008

SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

47

References• Marco Platzner. „Reconfigurable Computer Architectures,“ e&i Elektrotechnik und Informationstechnik,

115(3):143-148, 1998. Springer.• Y. Li, T. Callahan, E. Darnel, R. Harr, U. Kurkure and J. Stockwood, “HardwareSoftware Co-Design of

Embedded Reconfigurable Architectures,” 37th Design Automation Conference, 2000. Proceedings DAC pp.:507 - 512, June 5-9, 2000.

• J. P. Heron, R. Woods, S. Sezer, and R. H. Turner. “Development of a run-time reconfiguration system with low reconfiguration overhead,” Journal of VLSI Signal Processing, 28(1/2):97-113, May 2001.

• “Xilinx Microblaze Soft Processor Core,” http://www.xilinx.com/ise/embedded/edk6_2docs/mb ref_guide.pdf, last accessed on October 19, 2004

• G. Aggarwal, N. Thaper, K. Aggarwal, M. Balakrishnan, and S. Kumar. “A Novel Reconfigurable Co-Processor Architecture,” In Proceedings of Tenth International Conference on VLSI Design, pages 370-375, January 1997.

• G. Haug and W. Rosenstiel. “Reconfigurable Hardware as Shared Resource in Multipurpose Computers,” In Reiner W. Hartenstein and Andres Keevallik, editors, Field-Programmable Logic: From FPGAs to Computing Paradigm, Springer-Verlag, pages 149-158, Berlin, August/September 1998.

• “Xilinx Virtex-II Platform FPGAs: Complete Data Sheet,” DS031 (14 Oct. 2003).• D. Wo and K. Forward, “Compiling to the Gate Level for a Reconfigurable Co-Processor” In Proceeding of

FPGAs for Custom Computing Machines (1994), pages 147-154.• V. Groza, R. Abielmona, M. El-Kadri, N. Sakr, and M. Elbadri, “A Reconfigurable Co-Processor for Adaptive

Embedded Systems,” Workshop on Intelligent Solutions in Embedded Systems, Graz, Austria, June 2004.• “IBM On-Chip Peripheral Bus,” http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/

9A7AFA74DAD200D087256AB30005F0C8/$file/OpbBus.pdf last accessed on October 19, 2004• R. Abielmona, V. Groza, N. Sakr, and J. Ho, “Low-Level Run-Time Reconfiguration of FPGAs for Dynamic

Environments,” IEEE Canadian Conference on Electrical and Computer Engineering, CCECE 2003, Niagara Falls, May 2004.

• B. Blodget, P. James-Roxby, E. Keller, S. McMillian, and P. Sundararajan. “A Self reconfiguring Platform,” Proceedings of the International Conference on Field Programmable Logic, Lisbon, Portugal, Sept. 2003.


Recommended