Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
1
Hardware/Software Codesign of
Embedded Systems
ReconfigurableReconfigurable ComputingComputing
Voicu Groza SITE Hall, Room 5017
562 5800 ext. [email protected]
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
2
Outline
1. Introduction2. Enabling Technologies3. Fix, configurable, reconfigurable ...4. Reconfigurable Architectures5. Run-Time-Reconfigurable System-on-
Chip6. Conclusion and Future Work7. References
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
3
1. Introduction• Reconfigurable computing – Definition
• Why reconfigurable computing ?
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
4
Reconfigurable Computing - Definition
• Reconfigurable Computing (RC) = presence of hardware (HW) that can be reconfigured (reconfigware - RW)
• 1960: Gerald Estrin, “The UCLA Fixed-Plus-Variable (F+V) Structure Computer”
• DeHon and Wawrzynek: “computing via a postfabrication and spatially programmed connection of processing elements.”– The architecture used in the computation is determined postfabrication
and can therefore adapt to the characteristics of the executed algorithms.
– The computation is spatial, in contrast to the more temporal style associated with microprocessors.
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
5
Re-inventing the wheel...
wire your own computer
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
6
Why reconfigurable computing ?• Is your belt long enough?
• Embedded hand-held devices need to reduce– the power consumption targets, – the acceptable packaging and manufacturing costs,– the time-to-market
• High-performance computing• Today’s computationally intensive applications require
more processing power: – streaming video, – image recognition and processing, – highly interactive services– telecommunications– genes
• Cray revived its latest entry-level XD1 supercomputer by combining AMD Opteron processors with FPGAs for compute acceleration in a Linux environment.
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
7
Why reconfigurable computing … cont.PRO CON
High-performance micro-processors
Versatile SW Off the-shelf solution
For some applications: might not be fast enough power consumption
(>100W/gigaFLOP) cost (+k$s)
Reconfigurable Computing Systems
Versatile SW & HW Computing structure matches
application Given fabric can implement
numerous functional units. Built out of off-the-shelf
components, reduce design-time
wires are slow & big bit-slices are costly to
interconnect -> large silicon area & performance overhead
devices must store configuration on the chip
Application-Specific Integrated Circuits (ASIC)
Does not suffer from the serial (and often slow and power-hungry) instruction fetch, decode and execute cycle that is at the heart of all microprocessors.
Consumes less power
fixed structure the cost of producing an
ASIC (the mask’s cost = 1 M$ ),
the time to develop a custom integrated circuit
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
8
• Programmable ICs: CPLD and FPGA (Xilinx 1984)• HW Abstractions
– Fine-grained Reconfiguration is at the gate and register level.
• By reconfiguration of registers, gates, and their interconnections, the internal structure of functional units is changed.
• 2 major technologies:– Complex Programmable Logic Devices (CPLD) – EEPROM based– Field-Programmable Gate Arrays (FPGA) – SRAM based
– Coarse-grained Reconfiguration is based on a set of fixed blocks, like functional units, processor cores, and memory tiles.
• The reconfiguration is merely the reprogramming of the interconnections between the fixed blocks.
2. Enabling Technologies
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
9
Complex Programmable Logic Devices (CPLD)
• Supplied with no predetermined logic function.
• Programmed by user to implement any digital logic function.
• Requires specialized computer software for design and programming.
• Complex PLD (CPLD) = A PLD that has several programmable sections with internal interconnections between the sections.
• The basic building block of a CPLD is a macrocell which implements a logic function that is synthesized into a sum of product equations, followed by a D-type register.
• Macrocells are grouped into logic blocks which are connected via a centralized interconnect array.
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
10
Altera MAX 7000 macrocell
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
11
Field-Programmable Gate Array (FPGA)
Universal gates
and/or
storage elements
Interconnectionnetwork
Switches
• Reconfigurable functional units – coarse grained - ALUs and storage– fine-grained - small lookup tables
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
12
Basic ingredient: Look Up Table (LUT)
0001
a0
a1
a0
a1
data
a1 & a2 Memory elements: SRAM
Logic Cell
Universal gate =
= Look-up table = memory
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
13
Configurable Logic Blocks (CLB - Xilinx)Logic Array Block (LAB – Altera)
XIL
INX
Spa
rtan
II
CLB
2 logic cells =1 slice (Xilinx) or= 1 Adaptive Logic Module (ALM - Altera)
2 slices = HW abstractions Configurable Logic Blocks (CLB - Xilinx)
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
14
Xilinx - Spartan II Architecture• IOBs provide the interface
between the package pins and the internal logic
• CLBs provide the functional elements for constructing most logic
• Dedicated block RAM memories (4096 bits each)
• Clock DLLs for clock distribution delay compensation and clock domain control
• Versatile multi-level interconnect structure
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
15
Xilinx Virtex FPGA Model Logic block
SwitchMatrix
IO MuxCLB
Line Segments
ProgrammableInterconnect Point
(PIP)
SRAM
Buffer
Switch Matrix
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
16
Virtex-II Architecture Overview
DCM = Digital Clock ManagerBlock SelRAM =18 Kbit (2k x 9bit of dual-port RAM)Multiplier blocks 18-bit x 18-bit
1 CLB = 8 slices1 slice contains 2
function generators F & G which are configurable as
• 4-input look-up tables (LUTs), or
• 16-bit shift registers, or• 16-bit distributed
SelectRAM memory.
Device CLBsRow x Col
Logic Cells Slices
DistribRAM (Kb)
DSPBlockRAM (Kb)
SelRam
XC4VLX200 192 x 116 200,448 89,088 1392 96 336 6,048
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
17
3. Fix, configurable, reconfigurable ...
• A simple classification:1. Non-configurable computing2. Configurable computing3. Reconfigurable computing
• Each has its own characteristics, (dis)advantages and applications
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
18
3.1. Non-Configurable Computing
• Uses fixed hardware such as ASICs or Custom VLSI circuits (eg. Microprocessors like x86, Sparc, DEC, PowerPC, etc…)
• Long product turnaround time, usually around 3-6 months
• Optimized for performance• Can be quite costly• Hardwired thus no room for error, re-work, improvement
Execute
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
19
3.2. Configurable Computing
• Configuring host supervises FPGA reconfiguration of a new bitstream
• A bitstream is a sequence of bits which represents the burn-in configuration of the Hardware Block (HB) eg. synthesized, place and routed design
011010010111011011100010011000111001110010100110001110011100101001100011100111001010011000111001110010
1110010001111111111111111110011000111100011111111101101001011101101110001001100011100000000011010101011110101011010111111111111
Configuring Host Bitstream
Execute
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
20
3.2. Configurable Computing (Cont’d)Advantages:• Uses configurable hardware such as FPGA or CPLD• PLDs are soft wired for re-use of static hardware
resources• Cost effective• Quick turnaround time• Flexible and ease in design process
Disadvantages:• Inefficient use of hardware resources, cannot use idle
FPGA area during run-time• Slow reconfiguration time, because of reconfiguring
the entire FPGA for a single Hardware Block (HB)• Thus, must stop execution while reconfiguring a new
Hardware Block
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
21
3.3. Reconfigurable Computing
011010010111011011100010011000111001110010100110001110011100101001100011100111001010011000111001110010110010
1110010001111111111111111110011000111100011111111101101001011101101110001001100011100
Configuring Host
1110010001111111111111111110011000111100011111111101101001011101101110001001100011100
Bitstream
ExecuteWe could also use a placement algorithm to possibly fit all requested HBs into the FPGA
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
22
3. Reconfigurable Computing (Cont’d)Advantages:
– Same as Configurable Computing– No need to completely stop the execution while reconfiguring the
FPGA with a new HB– Efficient use of static hardware resources; can swap out or move
HBs around to fit new HBs on the FPGA, no need for a larger FPGA or a second one
– Fast reconfiguration times– Run-time reconfiguration on the fly– Less power consumption, as we can swap out HBs
Disadvantages:– Routing HBs can be a heavy overhead for the configuring host
especially if HBs are too large or when defragmentation is necessary
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
23
What is Run-Time Reconfiguration (RTR) ?
On-the-fly flexibility
Combines characteristics of co-processors with those of reconfigurable computing
Introduces overhead to reconfigure the co-processor but offsets by increasing execution speed (faster in H/W!)
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
24
4. Reconfigurable Architectures1. External stand-alone processing unit2. Attached processing unit3. Reconfigurable functional unit4. Co-processor5. Processor embedded in a reconfigurable fabric
(Compton & Hauck)
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
25
External stand-alone processing unit
The RECON SystemJohn Reid HauserJohn Wawrzynek Randy H. Katz(University of California, Berkeley)
Consists of a SUN SparcStation host and a reconfigurable coprocessor board (The board exploits a XC4010 FPGA as the reconfigurable processor unit).
RPU coupled to the I/O system bus
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
26
Attached processing unit
TKDMMarco PlatznerETH Zurich
• FPGA module that uses the DIMM (dual inline memory module) bus for high-bandwidth communication with the host CPU.
• It is integrated with the Linux host OS;
• offers functions for data communication and FPGA reconfiguration.
RPU coupled to the local bus
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
27
Attached processing unit (Cont.)
MorphosysNader BagherzadehUniversity of California, Irvine
• Coarse grain: MorphoSys operates on 8 / 16-bit data.
• Configuration: RC array is configured by context words, which specify an instruction opcode for RC.
• Depth of programmability: The Context Memory can store up to 32 planes of configuration.
• Dynamic reconfiguration: Contexts are loaded into Context Memory without interrupting RC operation.
• Local/Host Processor: The control processor (Tiny RISC) and RC Array are resident on the same chip.
• Fast Memory Interface: Through DMA controller.
• Consists of a combination of a RISC processor core with an array of coarse-grain reconfigurable cells;
• It utilizes a DMA controller in order to load the configuration data (context) into the Context Memory
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
28
Reconfigurable functional unit
Chimaera S. HauckUniversity Washington, SeatleSystem treats the reconfigurable logic as a cache for RPU instructions.
• Those instructions that have recently been executed, or that we can otherwise predict might be needed soon, are kept in the reconfigurable logic.
• If another instruction is required, it is brought into the RPU by overwriting one or more of the currently loaded instructions.
RPU integrated in the CPU
Chimaera
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
29
Co-processor
GARPHauser & WawrzynekUniversity of California, Berkley
• A reconfigurable architecture that combines reconfigurable hardware with a standard MIPS processor on the same die to retain better feature performance.• Two configurations can never be active at the same time on its reconfigurable array which can significantly reduce the overall performance of the system.
RPU coupled to the CPU
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
30
5. RTR-SoC System Architecture
RTR-SoC System Architecture
IBM OPB
Runs softwareinstructions
Execution unitof HBs
Stores HB bitstreams
Stores programand data code
Allows dedicated OMA-RPU access
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
31
Application and Reconfiguration Flows
• While the application flow runs on AE, RE sends RTR_PREP_HB to the ICAP controller, to start the loading of the first HB bitstream onto the RPU.
• Once this HB is ready in the RPU, the ICAP sends back an RTR_ACK to the RE.• The newly implemented HB on the RPU starts to work as soon as it is ENABLEd by the reconfiguration
flow on RE. • Upon completion, HB sets flag RTR_DONE to make the application flow aware that it is ready for use. • Once the application flow on AE has prepared data that HB needs, AE asserts the flag DATA_READY. • HB asserts EXE_DONE when finishes its task and has prepared the results to be read by the application
flow on AE. • When the application flow needs these results, it checks the flag EXE_DONE, and waits if it is not yet set. • The application flow gets the results and then asserts DATA_ACK to acknowledge to HB that it got data.
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
32
Final system architecture
RE
AE
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
33
Tasks running on AE and RE
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
34
Physical Layer Overview• Have already developed a physical layer in JBits in
order to evaluate RTR on a Xilinx Virtex device• Physical layer has 3 main functions
– modeling the FPGA resources, – running a placement algorithm for the different Hardware
Blocks, and – managing the physical resources of the FPGA and any on-
board peripherals.
JBits is a set of Java APIs and classes that provide a High-Level language approach to develop reconfigurable Systems, include RT reconfiguration.
RTR Execution Model Bitstream(s) read by the JBits App JBits App configures the Virtex RC HW located in the
PCI slot using the XHWIF API. XHWIF (Xilinx HardWare InterFace Standard)
Java interface for communicating with FPGA-
based boards.
This Enables run-time reconfiguration of Virtex Device.
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
35
Hardware Block (HB) Architecture
Packer DispatcherCU
I-Buffer
O-Buffer
r/w
Mem ack
Mem req
PELM
PELM
PELM
PELM
PELM
PELM
PELM
PELM
PELM
.
.
.
.
.
.
…
…
…
.
.
.
Register Decoder
.
.
.
RS1n
RS10HB sel1
Data_ MAB
valid
Register Decoder
.
.
.RS20
HB sel2
reg sel2
I/F addr HB
addr MABr/w opb
data_ opb data HB
ss opb
RS2n
r/w hb
ss mc
reg sel1
HBDU
HBIU
done
• An HB is a functional hardware module that contains its own configuration (i.e. the bitstream), and state information (e.g. status and control registers) that define its current state.
• It is divided into two major components:–The HB Dependent Unit (HBDU) Encompasses several components that vary in functionality and magnitude depending on the functions supported by a particular HB.
–The HB Independent Unit (HBIU)Designed as a core and hence follows a standardized implementation scheme for all HBs.
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
36
Hardware Block Reconfiguration• The HBs are partially reconfigured by the aforementioned
Reconfigurable Processing Unit (RPU).• The reconfiguration process is enabled by means of a Self-
Reconfiguration Platform (SRP). – It enables the FPGA to be dynamically reconfigured under
the control of an embedded microprocessor.– It is divided into a H/W component and S/W components.
ICAP
Control Logic
BRAM
FPGA Configuration
MemoryMicroBlaze
OPB Bus
• The H/W component consists of four primary components: the Internal Configuration Access Port (ICAP), some control logic, a small configuration cache - Block RAM (BRAM), and an embedded processor.
• The S/W component implements an API that defines methods for accessing configuration logic through the ICAP port.
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
37
I/OB
Block R
AM
s
Multipliers 18 x 18
Block R
AM
s
Multipliers 18 x 18
Block R
AM
s
Multipliers 18 x 18
Block R
AM
s
C onfigurableLogic
Block
lock
• Each CLB contains four slices.
• Each slice contains two 4-input look-up tables, 2 D-type flip-flops to implement combinational and sequential circuits.
• Virtex II FPGAs fabric composed of an array of Configurable Logic Blocks (CLBs).
• Block RAMs (BRAM).• Input/Output Blocks (IOBs).• Special functions blocks such as
Multipliers, PLLs etc.
PR Methodology: Xilinx Virtex II Architecture
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
38
PR Methodology
– Bus Macros (BMs) are required between active and static modules of the design.
– The size and location of the reconfigurable module (active) is always fixed.
– The reconfigurable module is always the full height of the device;
– All logic resources located within the width of the module are considered part of the reconfigurable module’s bitstream frame. This includes slices, tri-state buffers (TBUFs), block RAMs (BRAMs), multipliers, input/output blocks (IOBs), and all routing resources.
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
39
Bus Macro block Diagram–Bus Macros (BMs) are predefined physical routing bridges that connect the active to the static one.–Any connection from active to static logic should always go through a bus macro –We chose the slices bus macros (over the TBUF) as they give higher concentration of communication bits per CLB –Bus macros allows data to move in only one direction either left-to-right or right-to-left.
PR Methodology
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
40
Block RAMS
MicroBlaze A
MicroBlaze R
Act
ive B
lock
Bu
s M
acro
s
ICAP_VIRTEX2
Final Design Layout
Active Module Fixed Module
R2L
BUS
MACRO
L2R
BUS
MACRO
0
7
7
0
Design contains only one active module. All other logic components are on the static module.
PR Methodology
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
41
Xilinx Internal Configuration Access Port (ICAP)
MicroBlaze
OP
B_
BU
S
OPB_HWICAP
ICAP
Control Logic
BRAM
– Provides configuration interface to FPGA fabric.
– Cache BRAM to hold at least one frame.
– Control logic for the OPB bus interface.
– API calls to allow SW to read/Write configuration memory.
PR Methodology
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
42
PR Methodology• A partial bitstream is generated for the active (dynamic)
part of the FPGA • The device remains in full operation while the new partial
bitstream is downloaded • The full bitstream configuration must already be
programmed into the device before downloading the partial bitstream.
• Multiple bitstreams can be generated for every partially reconfigurable module variation
• Failing to utilize this command will assert the global set reset (GSR) during configuration, resetting the entire design – –g ActiveReconfig: Yes option
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
43
PR Methodology– Virtex-II configuration
memory is arranged in vertical frames that are one bit wide and stretch from the top edge of the device to the bottom.
– These frames are the smallest addressable segments of the Virtex-II configuration memory space; therefore, all operations must act on whole configuration frames.
– The length of a Virtex-II frame is not fixed and depends on the size of the device.
– the number of frames per column type is constant for all devices.
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
44
Reconfigurable Processing Unit
The RPU high-level block diagram
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
45
Preliminary Results• Xilinx Virtex-II Platform FPGAs were used to implement
this system. • Preliminary results were generated using ModelSim SE
5.7f.
Simulation results for the HB I/F interface. They illustrate how the I/F is used in order to enable proper synchronization among the reconfiguration flow and the application flow.
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
46
6. Conclusion and Future Work
• A novel architecture of a RTR SoC is introduced
• RPU and HBs are designed
• This design targets adaptive embedded systems, DSP-related and low-power applications
• These functions are implemented as HBs and can be exploited in a multi-purpose environment. For example, the RTR SoC may execute various tasks to perform DSP-related functions, and subsequently reconfigured into a high-performance measurement processing system
• Future designs would allow the user more flexibility by auto-reconfiguring the RPU depending on the computational and functional needs of its respective applications
• Real-time applications is our future target, as idle HBs are swapped out of the RPU, to save power or to allow for updates to the HBs
Voicu Groza, 2008
SITE, 2008 - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS
47
References• Marco Platzner. „Reconfigurable Computer Architectures,“ e&i Elektrotechnik und Informationstechnik,
115(3):143-148, 1998. Springer.• Y. Li, T. Callahan, E. Darnel, R. Harr, U. Kurkure and J. Stockwood, “HardwareSoftware Co-Design of
Embedded Reconfigurable Architectures,” 37th Design Automation Conference, 2000. Proceedings DAC pp.:507 - 512, June 5-9, 2000.
• J. P. Heron, R. Woods, S. Sezer, and R. H. Turner. “Development of a run-time reconfiguration system with low reconfiguration overhead,” Journal of VLSI Signal Processing, 28(1/2):97-113, May 2001.
• “Xilinx Microblaze Soft Processor Core,” http://www.xilinx.com/ise/embedded/edk6_2docs/mb ref_guide.pdf, last accessed on October 19, 2004
• G. Aggarwal, N. Thaper, K. Aggarwal, M. Balakrishnan, and S. Kumar. “A Novel Reconfigurable Co-Processor Architecture,” In Proceedings of Tenth International Conference on VLSI Design, pages 370-375, January 1997.
• G. Haug and W. Rosenstiel. “Reconfigurable Hardware as Shared Resource in Multipurpose Computers,” In Reiner W. Hartenstein and Andres Keevallik, editors, Field-Programmable Logic: From FPGAs to Computing Paradigm, Springer-Verlag, pages 149-158, Berlin, August/September 1998.
• “Xilinx Virtex-II Platform FPGAs: Complete Data Sheet,” DS031 (14 Oct. 2003).• D. Wo and K. Forward, “Compiling to the Gate Level for a Reconfigurable Co-Processor” In Proceeding of
FPGAs for Custom Computing Machines (1994), pages 147-154.• V. Groza, R. Abielmona, M. El-Kadri, N. Sakr, and M. Elbadri, “A Reconfigurable Co-Processor for Adaptive
Embedded Systems,” Workshop on Intelligent Solutions in Embedded Systems, Graz, Austria, June 2004.• “IBM On-Chip Peripheral Bus,” http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/
9A7AFA74DAD200D087256AB30005F0C8/$file/OpbBus.pdf last accessed on October 19, 2004• R. Abielmona, V. Groza, N. Sakr, and J. Ho, “Low-Level Run-Time Reconfiguration of FPGAs for Dynamic
Environments,” IEEE Canadian Conference on Electrical and Computer Engineering, CCECE 2003, Niagara Falls, May 2004.
• B. Blodget, P. James-Roxby, E. Keller, S. McMillian, and P. Sundararajan. “A Self reconfiguring Platform,” Proceedings of the International Conference on Field Programmable Logic, Lisbon, Portugal, Sept. 2003.