Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | eileen-lloyd |
View: | 216 times |
Download: | 0 times |
ENG6530 RCS 1
ENG6530 Reconfigurable
Computing Systems
Hardware Software Co-designHardware Software Co-design
ENG6530 RCS 2
Topics H/S Co-Design DefinitionH/S Co-Design Definition MotivationMotivation Design Steps, Design Steps,
Profiling, Profiling, Partitioning Partitioning AllocationAllocation
Xilinx EDK Xilinx EDK
ENG6530 RCS 3
References ““Embedded System Design: A Unified Embedded System Design: A Unified
Hardware/Software Introduction” by Frank Vahid, Hardware/Software Introduction” by Frank Vahid, Wiley, 2002.Wiley, 2002.
““Hardware/Software Codesign: A systematic Hardware/Software Codesign: A systematic approach targeting data-intensive applications”, approach targeting data-intensive applications”, Wayne Luk, IEEE Signal processing Magazine, Wayne Luk, IEEE Signal processing Magazine, May 2005.May 2005.
“Hardware-Software Co-synthesis for Digital Systems”, R.Gupta, G. De Micheli, G., IEEE Design & Test of Computers, September 1993, pp. 29-41
“Hardware/Software Design Space Exploration for a Reconfigurable Processor”, A. Rosa, 2003.
“A Framework for Hardware/Software Co-design”, S. Kumar, Q. Wulf, IEEE 1993.
ENG6530 RCS 4
Definition – Hardware/Software Co-DesignDefinition – Hardware/Software Co-Design
The design of computer systems that incorporates both standardized off the shelf processors, or softwaresoftware, as well as specializedspecialized hardware hardware. The cooperative designcooperative design of hardware and
software components. The unificationunification of currently separate hardware
and software paths. The movement of functionalitymovement of functionality between
hardware and software.
ENG6530 RCS 5
H/S Co-design: ExampleH/S Co-design: Example Optical wheel speed sensor. System constraints Area – 40 units, time – 100 cycles This could be implemented using either standardized
processors, specialized hardware or a combination of both
Input
Decoding
FIR
Filter
Tick to Speed
Inversion
Output
Encoding
ENG6530 RCS 6
H/S Co-design: SoftwareH/S Co-design: Software Design implemented in software System constraints
Area – 48 unitsArea – 48 units > 40 units Time – 132 cyclesTime – 132 cycles > 100 cycles
Design Time – 2 months
Processor #1Processor #1 Processor #2Processor #2
ENG6530 RCS 7
H/S Co-design: HardwareH/S Co-design: Hardware Design implemented in custom RTL hardware System constraints
Area – 24 unitsArea – 24 units, < 40 units Time – 52 cyclesTime – 52 cycles << 100 cycles
Surpasses both area and timing constraints by 40%40% Design Time – 9 months
Delay in design is unacceptable in a competitive world.
ENG6530 RCS 8
H/S Co-designH/S Co-design Design implemented in hardware & software System constraints
Area – 37 unitsArea – 37 units, < 40 units Time – 95 cyclesTime – 95 cycles << 100 cycles
I. Design Time – 3.5 monthsII. Not as efficient as design II However, it establishes a balance balance between two extremes.
Processor #1Processor #1
ENG6530 RCS 9
Achieve performanceAchieve performance by moving software bottlenecks to hardware Use hardware to meetmeet time & area constraints time & area constraints which cannot
be met alone using general purpose processors. Not possible to put everything in hardware due to limited limited
resourcesresources
Some code more appropriate for sequential implementation (i.e. achieve flexibilityachieve flexibility)
Today’s designs are focusing on Embedded Systems on Embedded Systems which require both hardware and software modules
MotivationsMotivations
ENG6530 RCS 10
Motivations … contMotivations … cont
The complexitycomplexity and functionality of computer systems are increasing at a dramatic rate SystemOnChip (SOC)(SOC). It is difficult difficult for custom systems to be designed,
built, verified within an acceptable time periodwithin an acceptable time period even with advanced CAD tools unless standardized parts are used. (Solution?)
Take advantage of previously designedpreviously designed (IPs) and tested processor to reduce time and improve reliability.
ENG6530 RCS 11
Trade-offs/DecisionsTrade-offs/Decisions Given a set of specified goals and
implementation technology, constraints, … designers consider trade-offsdesigners consider trade-offs in how hardware and software components work together.
Decisions, Constraints and Evaluations?Decisions, Constraints and Evaluations? Performance. Area. Power. Flexibility (Programmability). Development & Manufacturing costs. Reliability Robustness Maintenance Design evolution.
ENG6530 RCS 12
Hw/Sw Co-Design: ResearchHw/Sw Co-Design: Research
Research in hardware-software co-design encompasses many interesting areas of research such as:
I.I. System specificationSystem specification and modelingII.II. Design ExplorationDesign Exploration
System co-verificationco-verification and co-simulation Code generationCode generation for hardware/software Hardware/Software interfacinginterfacing
III.III. PartitioningPartitioning IV. SchedulingV. However the most important objective is to develop
a unified design methodology/tool for creating systems containing both hardware and software.
ENG6530 RCS 13
A Simple ApproachA Simple Approach
Application
Evaluation
Decision
S/W H/W
Partitioning
Profiling
Scheduletasks
ENG6530 RCS 14
Profiling and Partitioning
SW__________________
SW__________________
SW__________________
HW__________________
SW__________________
SW__________________
ProcessorProcessor ProcessorASIC/FPGA
Critical Regions
ProfilerProfiler Benefits Speedups of 2X to
10X typical Far more potential
than dynamic SW optimizations (1.2x)
Energy reductions of 25% to 95% typical
Time Energy
SW OnlyHW/ SW
Time Energy
SW Only
ProcessorProcessor
ENG6530 RCS 15
ProfilingProfiling Profiling allows you to learn where your programwhere your program
spent its timespent its time and which functions called which other functions while it was executing. The profiler uses information collected during the actual
execution of your program, therefore, it can be used on programs that are too largetoo large or tootoo complex to analyzecomplex to analyze by reading the source.
This information can show you which pieces of your program are slower than you expectedslower than you expected. These might be candidates for either:
Rewriting code to make your program execute faster. Moving these functions to hardware.
ENG6530 RCS 16
Profiling: StepsProfiling: Steps You must compile and link your program with
profiling enabled. cc -o myprog.exe myprog.c utils.c –g –pgcc -o myprog.exe myprog.c utils.c –g –pg
You must then execute your program to generate a profile data file Your program will write the profile data into a file called
`gmon.outgmon.out’ just before exiting.
You must run gprof to analyze the profile data. gprofgprof optionsoptions myprog.exe gmon.outgmon.out > outfile The gprof program prints a flat profile and a call graph
ENG6530 RCS 17
Profiling: Useful HintsProfiling: Useful Hints Options:
-e-e function_namefunction_name : tells gprof to NOT print information about the function function_namefunction_name (and its children …) in the call graph.
-f-f function_namefunction_name: causes gprof to limit the call graph to the function function_namefunction_name and its children.
-b-b : gprof doesn’t print the verbose blurbs that try to explain the meaning of all of the fields in the tables.
ENG6530 RCS 18
Profiling: Flat ProfileProfiling: Flat Profile
% time% time : is the percentage of the total execution time your program spent in this function. cumulative secondscumulative seconds: This is the cumulative total number of seconds the computer spent
executing this function plus time spent in all the functions above. self secondsself seconds: This is the number of seconds accounted for by this function alone. callscalls: this is the total number of times the function was called. self ms/callself ms/call: This represents the average number of milliseconds spent in this function per
call. total ms/calltotal ms/call: This represents the average number of milliseconds spent in this function and
its descendants per call. namename: This is the name of the function.
ENG6530 RCS 19
Simple Approach: Simple Approach: DrawbacksDrawbacks
I. Some functions might not be easily mapped onto hardware.
II. Decisions taken very early at profiling phase might not be optimal.
III. No consideration for interfacing and communication.
IV. If the application changes slightly then we need to re-profile and re-partition.
ENG6530 RCS 20
Applications Not suitable for RCSApplications Not suitable for RCS
Not all applications are suitable for Reconfigurable Computing:
Applications that involve extensive recursionextensive recursion, for example, are a poor match because the synthesized “hardware” must be of fixed size.Applications that have only a small percentage of parallelismsmall percentage of parallelism (1-5%) will not make advantage of RCS.Applications that are I/O boundI/O bound will also suffer due to memory I/O transferApplications that require floating pointrequire floating point arithmetic
Design Space ExplorationDesign Space ExplorationScheduling/Arbitration
proportionalshareWFQ
staticdynamicfixed priority
EDFTDMA
FCFS
Communication Templates
Architecture # 1 Architecture # 2
Computation Templates
DSP
E
Cipher
SDRAMRISC
FPGA
LookUp
DSP
TDMA
Priority
EDF
WFQ
RISC
DSP
LookUp
Cipher
E E E
E E E
static
Which architecture is better suitedfor our application?
ENG6530 - Design Exploration 21
ENG6530 RCS 22
H/S Codesign: A FrameworkH/S Codesign: A FrameworkSystem
Representation
System
EvaluationCoDesign
Decomposition
(Break down system
functions into a
collection of
sub-functions)
H/S Partitioning
(Determine which of
the sub-functions
should be
implemented in H/S)
Refinement
(Produce a hardware
software alternative
via evaluation)
System
Integration
ENG6530 RCS 23
Co-Synthesis/Co-DesignCo-Synthesis/Co-Design
ENG6530 RCS 24
Partitioning & SchedulingPartitioning & Scheduling Task partitioningpartitioning and task schedulingscheduling are required in
many applications, for instance co-designco-design systems, Multi Processing Systems Multi Processing Systems and High Level SynthesisHigh Level Synthesis.
Sub-tasks extracted from the input description should be implemented in the WhereWhere? The right placeplace (using the Partitioner/Partitioner/PlacerPlacer) WhenWhen? The right timetime (using the schedulerscheduler)
It is well known that such scheduling and partitioningscheduling and partitioning problems are NP-completeNP-complete.
Optimization techniques based on heuristic methodsheuristic methods are generally employed to explore the search space so that feasible and near-optimal solutions can be found.
ENG6530 RCS 25
System PartitioningSystem Partitioning
Good partitioning mechanism:
1) Minimize communication across bus
2) Allows parallelism both hardware (FPGA) and processor operating concurrently
3) Load Balancing Near peak processor utilization at all times (performing useful work)
process (a, b, c) in port a, b; out port c;{ read(a); … write(c);}
Specification
Line (){ a = … … detach}
Processor
Capture
Model FPGA
Partition
Synthesize
Interface
ENG6530 RCS 26
Terminology: HypergraphsTerminology: Hypergraphs
a netlist is a hyper-graph Hyper-graphs can be approximated as graphs, breaking
each hyper-edge into a clique of edges
a hypergraph H = <V, Eh>
V is a set of verticesh Eh is a subset of vertices, 2V
a graph G = <V, E>
V is a set of verticese E is a pair of vertices (u,v)
ENG6530 RCS 27
Bi-partitioning ProblemBi-partitioning Problem given a hyper/graph G
find a partition P of VV1, V2 s.t V1V2=, V1V2=V
minimizing number of edges that cross the cutmin c(P) = all h w(h) if (uV1 and vV2)
where u and v are connected by edge h
subject to a capacity constraint
> |V1| / |V2| >
ENG6530 RCS 28
Bipartitioning ApproachesBipartitioning Approaches Exact Methods:
Mixed Integer Programming (using Branch and Bound) !! min-cut / max-flow (Ford-Fulkerson 1962)
maximum flow through graph = minimum cut useful for establishing unconstrained bound
Heuristics (Local Search) Kernighan-Lin (1970)
operates on graphs swap all nodes once, in pairs that yield max. gain choose greatest gain over pass,repeat until no improvement O(n2log n)
Fiduccia-Mattheyses (1982) operates on hypergraphs O(p), linear time!
Meta Heuristics (avoid getting stuck in local minima) Simulated annealing
select some random moves based on “temperature” design hopefully “cools” into optimal solution computationally intensive
Tabu Search Genetic Algorithms Particle Swarm Optimization
ENG6530 RCS 29
Fiduccia-MattheysesFiduccia-Mattheyses
- generate initial partition- calculate gain g(c) of moving each cellwhile improvement{
clear cells being locked;while max g(c) > 0 | c locked {
select cell with max g(c) | c locked;move c across the cut;c → locked;update g(c) for all of c’s neighbors;
}
}
oneonepasspassO(p)O(p)
ENG6530 RCS 30
ExampleExample
f
a c
ed
b
• all edges have unit weight
• given balance criteria:
|V1| -1 ≥ |V2| ≥ |V1| + 1
goal: partition graph into twodisjoint halves so as to minimize thenumber of hyperedges that span the cut
ENG6530 RCS 31
Example (cont’d)Example (cont’d)
f
a c
ed
b
Step 1.Step 1.
random partitionassigned to keep balance
number of cuts = 5number of cuts = 5
ENG6530 RCS 32
Example (cont’d)Example (cont’d)
d
a c
ed
b Step 2.Step 2.
initial gains arecalculated for each cell
results are placed intobucket array
+1+2
+2
+1-1
+2
number of cuts = 5
ENG6530 RCS 33
Example (cont’d)Example (cont’d)
d
a c
ed
b Step 3.Step 3. cell is selected
gains of critical netsare updated
cell is locked fromfurther movement
+10
0
+1-1
0
number of cuts = 3number of cuts = 3
ENG6530 RCS 34
Example (cont’d)Example (cont’d)
d
a
c
ed
b Step 3.Step 3. Another cell is selected
gains of critical netsare updated
cell is locked fromfurther movement
0
00
-1-1
0
number of cuts = 2number of cuts = 2
ENG6530 RCS 35
Co-design: ToolsCo-design: Tools Co-design tools should provide an
almost automatic frameworkautomatic framework for producing a balanced and optimized design from some initial high level specification.
The goal of co-design tools and platforms is not to push towards this not to push towards this kind of kind of total automationtotal automation.
The designer interactionsdesigner interactions and continuous feedback is considered essential.
The main goal is to incorporate in the black box of co-design tools that support for shifting functionalitysupport for shifting functionality and implementation between HW SW with effective and efficient evaluation.
ENG6530 RCS 36
H/S Co-Design: Approaches
Opposite strategiesVulcan (“primal” approach)
Functionality all in HW (HardwareC) initially Move some to CPU to reduce architecture cost
Cosyma (“dual” approach) Functionality all in SW (Cx) initially Move some to ASIC to meet performance goals
LycosConvert all functionality to neutral form
ENG6530 RCS 37
Partitioning AlgorithmsPartitioning Algorithms
Assume everything initially in software Select task for swapping Migrate to hardware and evaluate cost?
Timing, hardware resources, program and data storage, synchronization overhead
Cost evaluation and move evaluation similar to what we’ve seen regarding min-cut FM Algorithm.
task
Software Hardware
List of tasks List of tasks
ENG6530 RCS 38
AutomationAutomation
Compiler profiler determines dependence and rough performance estimates
Result of compilation is synthesizable HDL and assembly code for the processor
ENG6530 RCS 39
InterfacingInterfacing Interfacing
between software and hardware modules is crucial for successful Co-design
I. How data is passed between sub-modules efficiently.
II. The rate of exchange of information between modules
System Description
Hw/Sw Partitioning
Co Synthesis
InterfaceSoftware Hardware
System Integration
Co-Simulation
ENG6530 RCS 40
Interface Models: FIFOInterface Models: FIFOSynchronization through a FIFOFIFO can be implemented either in hardware or in
softwareEffectively reconfigure hardware (FPGA) to allocate
buffer space as needed Interrupts used for software version of FIFO
d1
d2d3
p1 p2 p3
r2
r3
FPGAControl/Data FIFO
ENG6530 RCS 41
MIPS/ARM
I$
D$
Configurable Logic
Profiler
Dynamic Part. Module
(DPM)
Profile application to determine critical regions
Partition critical regions to hardware
Program configurable logic & update software binary
Partitioned application executes faster with lower energy consumption
Initially execute application in software only
11
22
33
44
55
Warp Processors
ENG6530 RCS 42
SummarySummary Hardware/Software co-design Hardware/Software co-design is becoming
the common design style for building systems. H/S co-design allows the majority of a system
to be designed quickly designed quickly with standardized parts while special purpose hardware is used for time critical portions of the system.
Xilinx and Altera provide complete flow for H/S co-design.
Issues:I. How to partition the system?II. Communication overhead!!III. Platforms to be usedIV. Languages that support this paradigm.
ENG6530 RCS 43
44ENG6530 RCS
Embedded CPUs
PowerPC 405 (hard core) 32 bit embedded PowerPC RISC architecture Up to 450 MHz 2x16 kB instruction and data caches Memory management unit (MMU) Embedded in Virtex-II Pro and Virtex-4/5/6
ARM Cortex –A9 (hard core) 32 bit multicore processor Up to 900 MHz Xilinx Zynq 7000 Processing platform Device is processor based attached to FPGA High level of performance Reduces power, cost, size
MicroBlaze (soft core) 32 bit RISC architecture 2 64 kB instruction and data caches Hardware multiply and divide OPB and LMB bus interfaces...
45ENG6530 RCS
Embedded Processors
Embedded Processor
Core Type
Max Clock Frequency
Slices PLBsBlock RAMs
PowerPC Hard 222 MHz 1000 250 9
Microblaze Soft 180 MHz 940 235 9
Picoblaze Soft 221 MHz 333 84 3Picoblaze (optimized)
Soft 233 MHz 274 69 3
Hard core Faster Fixed position Few devices
Virtex-4 Processors:
Soft core Slower Can be placed anywhere Applicable to many devices
PowerPCPowerPCMicroBlazeMicroBlazeMicroBlazeMicroBlazePicoBlazePicoBlaze
ENG6530 RCS 46
Soft and Hard cores in current FPGAs
Power SupplyCLKCLK
CLKcustomIF-logic
SDRAM SDRAMSRAM SRAMSRAM
Memory Controller
UARTLC
DisplayController
InterruptController Timer
AudioCodec
CPU(uP / DSP) Co-
Proc.
GP I/O
AddressDecode
Unit
EthernetMAC
ENG6530 RCS 47
FPGA
Next Step...Next Step...
CLKCLK
CLKcustomIF-logic
SDRAM SDRAMSRAM SRAMSRAM
Memory Controller
UART
DisplayController
Timer
Power Supply
LC
AudioCodec
CPU(uP / DSP) Co-
Proc.
GP I/O
AddressDecode
Unit
EthernetMAC
InterruptController
ENG6530 RCS 48
Configurable System on a Chip (CSoC)Configurable System on a Chip (CSoC)
Power Supply
SDRAM SDRAMSRAM SRAMSRAM
LC
AudioCodec EPROM
ENG6530 RCS 49
Soft CPU Core: Soft CPU Core: „MicroBlaze“ „MicroBlaze“ (Xilinx Inc.)
ENG6530 RCS 50
PowerPC405 Core
Dedicated Hard IPFlexible Soft IP
RocketIO
PowerPC-based Embedded Design
Full system customization to meet performance, functionality, and cost goals
DCR Bus
UART GPIOOn-Chip
PeripheralHi-Speed
PeripheralGB
E-Net
e.g.Memory
Controller
Arb
iter
On-Chip Peripheral Bus
OPB
Arb
iter
Processor Local Bus
Instruction Data
PLB
DSOCMBRAM
ISOCMBRAM
Off-ChipMemory
ZBT SRAMDDR SDRAM
SDRAM
BusBridge
IBM CoreConnect™on-chip bus standardPLB, OPB, and DCR
ENG6530 RCS 51
MicroBlaze-based Embedded Design
Flexible Soft IPMicroBlaze32-Bit RISC Core
UART 10/100E-Net
On-ChipPeripheral
Off-ChipMemory
FLASH/SRAM
LocalLink™FIFO Channels
0,1…….32
CustomFunctions
CustomFunctions
BRAM Local Memory
BusD-CacheBRAM
I-CacheBRAM
ConfigurableSizes
Arb
iter
Processor Local Bus
Instruction Data
PLBBus
Bridge
PowerPC405 Core
Dedicated Hard IP
Arb
iter
Processor Local Bus
Instruction Data
PLBBus
BridgeBus
Bridge
PowerPC405 Core
Dedicated Hard IP
PowerPC405 Core
Dedicated Hard IP
PowerPC405 Core
Dedicated Hard IPPossible inVirtex-II Pro
Hi-SpeedPeripheral
GB E-Net
e.g.Memory
Controller
Hi-SpeedPeripheralHi-Speed
PeripheralGB
E-NetGB
E-Net
e.g.Memory
Controller
e.g.Memory
Controller
Arb
iter OPB
On-Chip Peripheral Bus
ENG6530 RCS 52
MicroBlaze: Architecture & FeaturesMicroBlaze: Architecture & Features
• RISC• Thirty-two 32-bit general purpose registers• 32-bit instruction word with three operands and two addressing modes• Separate 32-bit instruction and data buses OPB (On-chip Peripheral Bus)Separate 32-bit instruction and data buses OPB (On-chip Peripheral Bus)• Separate 32-bit instruction and data buses LMB (Local Memory Bus)Separate 32-bit instruction and data buses LMB (Local Memory Bus)
Architecture
Features
OPB
LMB
ENG6530 RCS 53
MicroBlaze: Bus ConfigurationsMicroBlaze: Bus Configurations
1.
2.
3.
4.
5.
6.
MicroBlaze core
• LMB: Memory Controller (BRAMs)
• OPB: Ext. Memory Ctrl., Interrupt Ctrl., UART, Timer,
Watchdog, SPI, JTAG-UART, etc.
ENG6530 RCS 54
Embedded DevelopmentTool Flow Overview
Compiler/Linker
(Simulator)
C Code
Debugger
Standard Embedded SWDevelopment Flow
CPU code in on-chip memory
?CPU code in
off-chip memory
Download to Board & FPGA
Object Code
Standard FPGA HWDevelopment Flow
Synthesizer
Place & Route
Simulator
VHDL/Verilog
?
Download to FPGA
EDK• The Embedded Development Kit (EDK) consists of the
following:– Xilinx Platform Studio – XPS– Base System Builder – BSB– Create and Import Peripheral Wizard– Hardware generation tool – PlatGen– Library generation tool – LibGen– Simulation generation tool – SimGen– GNU software development tools– System verification tool – XMD– Virtual Platform generation tool - VPgen– Software Development Kit (Eclipse)– Processor IP– Drivers for IP– Documentation
• Use the GUI or the shell command tool to run EDK
EDK Files
• MHS = Microprocessor Hardware Specification• MSS = Microprocessor Software Specification
• MPD = Microprocessor Peripheral Description• PAO = Peripheral Analyze Order
• BBD = Black-Box Definition• MDD = Microprocessor Driver Description• BMM = BRAM Memory Map
ENG6530 RCS 57
GenerateNetlist
*.mhs
Platform Definition(peripherals, configuration,
connectivity, address space)
Design Flow: Hardware IDesign Flow: Hardware I
Hardware
EDK / Xilinx Platform Studio
ENG6530 RCS 58
Design Flow: Hardware II, ISE EnvDesign Flow: Hardware II, ISE Env
Hardware
Platform Definition(peripherals, configuration,
connectivity, address space)
EDK: Embedded Development Kit XPS: Xilinx Platform Studio ISE: Integrated Software Environment MHS: Microprocessor Hardware Specification
GenerateNetlist
ISE
Platform Ext.Proj.Nav. / VHDL
*.mhs
*.bit
XPS
GenerateBitstream
*.ucf
ENG6530 RCS 59
Platform Definition(peripherals, configuration,
connectivity, address space)
EDK: Embedded Development Kit XPS: Xilinx Platform Studio ISE: Integrated Software Environment MHS: Microprocessor Hardware Specification
GenerateNetlist
*.mhs
*.bit
XPS
GenerateBitstream
*.ucf
Design Flow: SoftwareDesign Flow: Software
ISE
Platform Ext.Proj.Nav. / VHDL
Hardware Software
*.elf
*.c *.asm
Compile &
Link
*.h
Gen.Libs
ENG6530 RCS 60
Design Flow: Combine HW + SWDesign Flow: Combine HW + SW
GenerateNetlist
ISE
Platform Ext.Proj.Nav. / VHDL
*.mhs
*.elf
*.c *.asm
Compile &
Link
UpdateBitstrea
m
*.bit
*.h
Gen.Libs
Platform Definition(peripherals, configuration,
connectivity, address space)
EDK: Embedded Development Kit XPS: Xilinx Platform Studio ISE: Integrated Software Environment MHS: Microprocessor Hardware Specification
*.bit
XPS
GenerateBitstream
*.ucf
Hardware Software
*.bmm
ENG6530 RCS 61
SummarySummary Xilinx provides a CAD tool in the form of
EDK/ISE to implement a soft core and manage the whole hardware/software development process.
The soft cores in the form of a single Micro-Blaze enables hardware/software co-design where sequential code can run on the processor and bottlenecks can run on a dedicated hardware accelerator attached to the Micro-Blaze.