Post on 15-Mar-2022
transcript
Avnet SpeedWay Workshops
1
Accelerating Your Success™
V10_1_2_0
Avnet SpeedwayDesign Workshop™
Creating FPGA-based Co-Processors for DSPs Using Model Based Design Techniques
Lecture 5: Creating a Stand-alone Video System
Avnet SpeedWay Workshops
2
2Avnet SpeedWay Design Workshop™
Develop Executable Spec in Simulink
Partition Between DSP and FPGA Co-Processor
Model-Based Design Flow
Design Exploration for Targeting Hardware
Verify Hardware in HW Co-simulation
Implement Stand-Alone Video System
The final design phase after verification in simulation is implementation as a stand-alone system comprised of DSP and FPGA co-processor.
Avnet SpeedWay Workshops
3
3Avnet SpeedWay Design Workshop™
The Problem We Wish to Solve
Maintaining a complex system involving DSP and FPGA co-processor can be tedious and error-prone.
MathWorks model-based design bridges TI DSP and Xilinx FPGA design flows with automatic code generation to remove the grunt work of manually maintaining the API, including memory-maps, function headers and C-code device drivers in Code Composer Studio.
Final FPGA co-processor system offers better performance.
Avnet SpeedWay Workshops
4
4Avnet SpeedWay Design Workshop™
Agenda
• Interfacing the DSP and FPGA Co-Processor
• Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module
• Model-Based Infrastructure for Stand-Alone Implementation
Avnet SpeedWay Workshops
5
5Avnet SpeedWay Design Workshop™
MATLAB® and Simulink®
Algorithm and System DesignMATLABMATLAB®® and Simulinkand Simulink®®
Algorithm and System DesignAlgorithm and System Design
Verif
y
Real-Time WorkshopEmbedded Coder,
Targets, Links
RealReal--Time WorkshopTime WorkshopEmbedded Coder,Embedded Coder,
Targets, LinksTargets, Links
Verif
y
Generate
Generate
Code Composer
Avnet Spartan3A-DSP FPGA / DaVinci Platform
C / ASM
XilinxXilinxXilinx
MathWorksMathWorksMathWorks
Link for CCSLink for CCS
Xilinx System Generator for DSP
Xilinx System Xilinx System Generator for DSPGenerator for DSP
HDL
ISEISE
Hardware Hardware CoCo--simulationsimulation
TITITI
AvnetAvnetAvnet
DSPDSP FPGAFPGA ChipscopeChipscopeChipscopeVerifyVideo source
LCDPanel
Design Flow for Stand-Alone Implementation…
< mouse click >
We begin by examining the connectivity for data transfer between the DM6437 and FPGA co-processor.
< mouse click >
We continue with automatic code generation of executables for both DSP and FPGA, including the Avnet board support package for Simulink on Avnet Spartan-3A DSP DaVinci development Kit.
< mouse click >
We conclude with in-system verification techniques of the combined the DSP and FPGA co-processor system.
Note that video now flows into the system from a live source, contrary to video frames generated by a Simulink testbench for verification using hardware co-simulation.
Avnet SpeedWay Workshops
6
6Avnet SpeedWay Design Workshop™
SimulinkAlgorithm and System Design
SimulinkSimulinkAlgorithm and System DesignAlgorithm and System Design
Image Translate
Sum-of-Absolute -Differences
(SAD)Location
estimation
Motion estimation
Relative motion vector from frame to frame
updated templateupdated ROI 1
2
Model Partition DSP / FPGA
Video Stabilization Model
Recall the steps of labs 3 and 4, where a Simulink model was partitionned between DSP and FPGA …
Moving to a stand-alone implementation, we must now bridge the FPGA co-processor hardware and to DSP software.
Avnet SpeedWay Workshops
7
7Avnet SpeedWay Design Workshop™
Co-Processor
DSPCore
?
• Requires hardware interface and communication protocol• Managing asynchronous clock domains• Software API to communicate with hardware
?
Bridging Software to Hardware.
Bridging software (DSP) to hardware (FPGA Co-Processor) requires:
•hardware interface and communication protocol
•managing asynchronous clock domains
•software API to communicate with hardware
Mouse click …
How can this be implemented ?
Let's examine these aspects in detail, especially as they relate to exchanging streaming data such as video between the FPGA co-processor and the DSP.
Avnet SpeedWay Workshops
8
8Avnet SpeedWay Design Workshop™
Co-Processor?Data Control Data Control
DSPCore
EMIF
Data and control on common bus (EMIF)– Obliges burst transfer over time-shared bus– Inefficient for streaming data (ex. video)– Requires inserted syncs, framing in DSP software, handshaking
Bridging Software to Hardware / EMIF
Bridging software on the DSP-side to the hardware co-processor requires first and foremost a hardware interface and communication protocol. One possibility is EMIF, ‘External Memory Interface’, which groups address, data and control signals for interface to external devices. EMIF comes in a variety of sizes across different families of DaVinci, from synchronous 32-bit data on DM642 to asynchronous 8-bit data on the DM6437.
It is convenient to differentiate between control data and streaming data. Control data is often bursty in nature and not time-critical, while streaming data is constant and requires a fixed bandwidth. Exchanging streaming data such as video between DSP and FPGA co-processor over a shared bus such as EMIF will require time-multiplexed burst transactions to accommodate other devices access to the bus. Control data must be inserted between streaming data bursts in a time-multiplexed bus-sharing. Furthermore, exchanging video over a bus such as EMIF would necessitate inserted syncs, and would require framing in software in the DSP and asynchronous FIFOs in the FPGA. For these reasons, EMIF is not the best choice of interface between the DSP and FPGA co-processor.
Avnet SpeedWay Workshops
9
9Avnet SpeedWay Design Workshop™
Control
DSPCore
VLYNQ
Separate data and control– Streaming full-duplex video over dedicated Video Processing Subsystem of
DM6437– Control over VLYNQ– Simple, fast, efficient
Co-ProcessorVLYNQLOGICORE
VPBE
VPFE VPFE INTERFACE
VPBE INTERFACE
Video
Video
Bridging Software to Hardware / VLYNQ
A simple and efficient approach is to transport streaming data over dedicated ports of the Video Processing Subsystem, while control data flows through a separate, non-time critical link. This offers simple, fast uninterrupted bi-directional streaming video between DSP and FPGA co-processor.
Let's examine the resources on DM6437 to implement separate video and control interfaces to the FPGA co-processor.
------------------------------------------------------
Why does video flow thru FPGA and not directly to DSP ?
… because board is built to pipe video thru FPGA to/from DSP.
------------------------------------------------------
Note: Although not officially supported, TI has done some work to allow general-purpose data, not just video, to flow into the VPFE and out of the VPBE ports.
Contact Bernie Thompson at TI.
Avnet SpeedWay Workshops
10
10Avnet SpeedWay Design Workshop™
Control
DSPCore
VLYNQ Co-ProcessorVLYNQLOGICORE
VPFE VPFE INTERFACEVideo
Video
VLYNQ
• Xilinx and TI collaborating for seamless interconnection between DSPs and FPGAs
• Low pin count, low cost, scalable bandwidth• DaVinci has on-chip VLYNQ peripheral• Xilinx VYNQ LogiCORETM IP delivered through Xilinx CORE
Generator
VLYNQ is a serial (i.e. low pin count) communications interface that enables the extension of an internal bus segment to one or more external physical devices (ex. FPGA). VLYNQ accomplishes this function by serializing bus transactions in one device, transferring the serialized transaction between devices via a VLYNQ port, and de-serializing the transaction in the external device.
VLYNQ peripheral is offered in DaVinci (DM644x and DM643x devices), Jacinto, Avalanche, Puma, Sangam, Titan, APEX and other TI communication processors.
Xilinx has licensed VLYNQ, so it is a great opportunity to connect FPGAs to TI DSPs, in addition to EMIF (External memory I/F) and Serial Rapid IO (SRIO).
Avnet SpeedWay Workshops
11
11Avnet SpeedWay Design Workshop™
Xilinx FPGA
CMD 1(10 bits)
Byte Count(10 bits)
Address(<4*10 bits)
End of PacketEOP (10 bits)
Data(N*10 bits)
CMD 2(10 bits)
Address Mask(4 bits)
Packet Type(4 bits)
CLK REQ (optional)CLOCK
RECEIVETRANSMIT
VLYN
Q
VLYN
Q
• Scalable to meet bandwidth requirements (3pin to 10pin)
• Single ended, unidirectional I/O• 8b/10b encoding. In-band signaling
• Memory mapped, master & slave on a single bus • Software transparent for future device integration
• High-Speed, low pin-count, full duplex, peer-to-peer Serial I/F• Extension of an internal bus segment to one or more external devices• Point-to-point serial interface for other VLYNQ compatible devices• External devices are mapped to local physical address space and appear
as if they are on the internal bus of the local device
VLYNQ
Avnet Spartan-3A DSP DaVinci board uses all 4 data (transmit / receive) pairs. Individual pins can be GPIO if a lower bandwidth VLYNQ interface is desired or not used.
Avnet SpeedWay Workshops
12
12Avnet SpeedWay Design Workshop™
35.7285.5616
32.49259.938
27.55220.374
15.84126.721
Throughput (Mbytes/sec)
Throughput (Mbits/sec)
Burst Size in 32-bit Words
Maximum Effective Throughput - With 99Mhz Clock (100 Mhz max clock supported)All benchmarks using 4 VLYNQ transmit/receive pairs.
• 8b/10b coding causes 20% overhead - only 8bits of data contained in every 10bits sent
• Total Overhead = protocol overhead + 8b/10b overhead• Theoretical Maximum throughput = 4 data lines X 100Mhz
max clock = 50 Mbytes/sec)
VLYNQ Performance
The max write rate describes the maximum available data rate of the serial interface for transmission, taking into consideration the 8b/10b encoding overheads. This is calculated as follows:
Max write rate = VLYNQ Serial Clock (MHZ) x No. of Pins x 8b/10b encoding overhead
The 8b/10b encoding overhead essentially accounts for 20% overhead, thus the actual effective data throughput after subtraction of the encoding overhead gives a factor of 0.8. For example, if the VLYNQ clock is running at 99 MHZ on a 4 pin per direction interface, the raw data is 99 x 4 or 396 Mbps. After the 8B10B encoding is removed, the maximum write rate is 396 x 0.8 = 316.8 Mbps.
The total throughput on the VLYNQ interface includes both transmit and receive directions. Therefore, for the above configuration, a remote device can also be writing to the local device at the same data rates, then the total throughput is the sum of transmit and receive rates, or 633.6 Mbps. In addition to the 8b/10b encoding, the packet structure for read/write operations also results in additional overheads. The VLYNQ module can transfer single 32-bit words or a burst of up to sixteen 32-bit words.
The data and throughput calculations shown here are sample calculations for most ideal situations. In general, the data rates depend on a variety of other factors, such as efficiency of read/write burst transactions, ability of buffering up read/write data, and how best it can be serially shifted out without stalling additional read/write data burst, remote and local components , both external and internal (device operations, board considerations, etc.).
References:
TMS320DM643x DMP VLYNQ Port User's Guide / TI Literature: spru938b.pdf (Appendix B)
Avnet SpeedWay Workshops
13
13Avnet SpeedWay Design Workshop™
Video Processing Subsystem
Connectivity
Video processing SubsystemDSP
CoreVLYNQ
Map Region 10400:0000
07FF:FFFF
Map Region 20800:00000800:00FF
Map Region 30800:01000801:00FF
Map Region 40801:01000841:00FF
Map Region 1
Map Region 2
Map Region 3
Map Region 4
Peripheral A0000:000003FF:FFFF
Address decode
Peripheral B0400:00000400:00FF
Peripheral C0500:00000500:FFFF
Peripheral D0B00:00000B3F:FFFF
VLYNQ
VLYNQ Remote Memory Mapping
• Remote VLYNQ devices memory mapped to the local (DSP host) device’s address space• Finer memory-decoding can target smaller address ranges within the FPGA co-processor
Remote VLYNQ device(s) are memory mapped to the local (host) device’s address space when a link is established (and appear as if they are on the internal bus, similar to any other on-chip peripherals). Enumerating the VLYNQ devices (single or multiple) into a coherent memory map for accessing each device is part of the initialization sequence.
After the enumeration, the host (local) device can access the remote device address map using local device addresses. The VLYNQ module in the host device manages the address translation of the local address to the remote address. A remote VLYNQ device is mapped to the local device’s address via the address map registers (TX address map, RX address map size n, RX address map offset n, where n = 1 to 4). The transmit side has a contiguous map; the size of the map is the same as the remote device map.
The figure illustrates this mapping.
This capability makes VLYNQ ideal for memory-mapping FPGA-based peripherals. For clarity, only 4 peripherals are shown above; finer memory-decoding can target any number of smaller address ranges to communicate with registers within the FPGA co-processor. The Avnet VLYNQ block allows memory-mapped address spaces down to single-register level using System Generator shared memory registers.
---------------------------------------------------------------------------------------------------------------------
Reference:
In the local device, the address of the VLYNQ remote memory map in the local configuration space is the transmit address accessing remote devices over the serial interface. The address of the VLYNQ remote memory map is programmed in the TX address map register (XAM). When the local device transmits, first it strips off the transmit address offset in the local device memory map Then the local device sends the
Avnet SpeedWay Workshops
14
14Avnet SpeedWay Design Workshop™
Agenda
• Interfacing the DSP and FPGA Co-Processor
• Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module
• Model-Based Infrastructure for Stand-Alone Implementation
Avnet SpeedWay Workshops
15
15Avnet SpeedWay Design Workshop™
Integration of 3 Pieces of Avnet Hardware
PS Video EXPSpartan-3A DSP DaVinciEvaluation Kit
6.5” NEC LCD panel
+ +
6.5” NEC panel is targeted for $500 resale, but we do not have an established price yet.
Avnet SpeedWay Workshops
16
16Avnet SpeedWay Design Workshop™
10/100/1G PHY
RS232
EXP
DDR2
Image SensorInterface
ParallelFlash
DDR2
SPI Flash
Component Video Out
Audio CODEC
DaVinciDM6437
RS232
VPBE
8- Bit EMIF
VPFE
SwitchesSPI
Flash
10/100 PHY
Clocks
JTAG JTAG
USBParallelFlash
I2C
VLYNQ
McBSP1
LEDS LEDS
Spartan 3A-DSP 3SD1800A
Avnet Spartan-3A DSP DaVinci Evaluation Kit
The Avnet Spartan-DSP DaVinci Evaluation Platform combines on the same baseboard the new Xilinx Spartan 3A-DSP FPGA and TI DaVinci TMS320DM6437 Digital Media Processor, optimized for video applications such as surveillance, automotive, machine vision.
DM6437 connects to Spartan3A-DSP over several interfaces : VLYNQ, EMIFA, VPBE, VPFE.
Features:
• Xilinx 3SD1800A-FG676 FPGA• Programmable LVDS Clock Generator• On-board 27 MHz LVTTL Oscillator• On-board LVTTL Oscillator Socket• 16M x 32-bit DDR2 SDRAM • 256K x 36bit ZBT SRAM• EXP Expansion Slot• 10/100 PHY• 64Mb x 2 SPI Configuration Flash• JTAG Programming/Configuration Port• RS232 Port• Two User LEDs• A 4-position User DIP Switch• Three User Push Button Switches• Audio CODEC shared with DM6437
TI DaVinci DSP Processor
• TMS320DM6437 Digital Media Processor• 128 MB 166 MHz DDR2 SDRAM• 64 Mb serial SPI Flash program code storage• 10/100 PHY• VGA Out• Audio CODEC shared with FPGA
Avnet SpeedWay Workshops
17
17Avnet SpeedWay Design Workshop™
• Xilinx XC3SD1800A-4FG676C FPGA• Clocks
– Programmable LVDS clock generator – On-board 27 MHz LVTTL oscillator – On-board LVTTL oscillator socket
• Memory– 128M x 32-bit DDR2 SDRAM– 16M x 8 parallel / BPI configuration Flash– 64Mb SPI configuration/storage Flash
• Interfaces– 10/100/1000 PHY– JTAG programming/configuration port– RS-232 serial port – Image Sensor Interface– 2 EXP expansion connectors
• Buttons and switches– 4 LEDs– Eight 4-position DIP switch– 4 push-button switches
• TI TMS320DM6437 DaVinciProcessor
• Memory– 128 MB 166 MHz DDR2 SDRAM – 128 Mb parallel Flash program code
storage – 64 Mb serial SPI Flash program code
storage • Interfaces
– 10/100 Ethernet Port – Component and composite video out – Audio CODEC shared with FPGA – USB
• Buttons and switches– 4 User LEDs
DSPFPGA
Avnet Spartan-3A DSP DaVinci Evaluation Kit
Avnet SpeedWay Workshops
18
18Avnet SpeedWay Design Workshop™
• High-Definition Video Decoder – Texas Instruments TVP7001 (RGB, Component)• Standard-Definition Video Decoder – Texas Instruments TVP5150 (Composite, S-Video)• DVI Transmitter – TFP410• DVI Receiver – AD9887A• Analog Devices ADV7123 RGB DAC• Parallel RGB and LVDS interfaces to Flat Panel Displays• Stereo Audio CODECs
Avnet Pro-Sumer Video EXP Module
The Avnet EXP ProSumer Video (EXP PS Video) Module is a plug-in module designed to interface with compatible Avnet baseboards, including the Avnet Spartan-DSP DaVinciEvaluation Platform. The EXP PS Video Module provides a number of video and audio interfaces to its host via two EXP connectors.
Avnet SpeedWay Workshops
19
19Avnet SpeedWay Design Workshop™
• NEC XGA LCD flat panel display -NL10276BC13-01C
• Super-Transmissive Natural Light TFT• 1024 x 768 Resolution• 6.5 inches Diagonal• 16.77M colors• LVDS Interface• LED Backlight
NEC TFT Display
Avnet SpeedWay Workshops
20
20Avnet SpeedWay Design Workshop™
VPBE
Avnet Spartan3A-DSP / DaVinci Evaluation Kit
2X Scaler
RGB24-bits
Flat PanelController
62.5 MPixels / sec
Flat Panel
Display
62.5 x 7 = 437.5 Mbps
1024 x 768XGA
LVDS Flat Panel Controller
Avnet provides a controller for LVDS flat panel displays. It is provided at no extra cost to customers who purchase the PS Video EXP module.
RGB + syncs digital video arrives at the flat panel controller at 62.5 MPixels / sec.
The outputs of the LVDS flat panel controller comprise 5 LVDS transmit pairs: • a forwarded clock at 1/7th the bit rate with 4:3 duty cycle comprising the LCD_FTXC
pair• 4 data lines LCD_FTX[3:0], each of which carry a 7:1 serialized bit stream.
These 5 LVDS transmit pairs originate from the baseboard FPGA, are routed up through the EXP connector to J6 of the Avnet EXP PS Video module.
J6 is a JAE FI-X30S-HF connector that accepts a cable assembly to drive a NEC 6.5” XGA TFT-LCD module.
Avnet SpeedWay Workshops
21
21Avnet SpeedWay Design Workshop™
Agenda
• Interfacing the DSP and FPGA Co-Processor
• Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module
• Model-Based Infrastructure for Stand-Alone Implementation
Avnet SpeedWay Workshops
22
22Avnet SpeedWay Design Workshop™
Avnet Board Support Package for Simulink
DM6437
Here is an overview of the Avnet board support package for Simulink for Spartan-3A DSP DaVinci Development kit. It is subdivided into 3 blocksets.
On the left are Simulink blocks that map to physical peripheral devices within the DM6437, such as UART, CAN and the Video-Processing subsystem.
On the right are blocks that are implemented in the Spartan3A-DSP. These blocks are used in the System Generator portion of the Simulink model.
At the bottom are Simulink blocks that map to the DSP, but which communicate with FPGA functions, or physical board-level circuitry via the FPGA, such as LEDs.
The Avnet board support package for Simulink is the result of collaborative work between Avnet and The MathWorks.
Avnet SpeedWay Workshops
23
23Avnet SpeedWay Design Workshop™
• Library of Simulink blocks supporting features of DM6437 on Avnet Spartan-3A DSP DaVinci Evaluation Kit
• Exposes parameters of each peripheral• Generates API to DSP/BIOS drivers
Avnet Board Support Library for Simulink
Overview of Simulink blocks in BSP to support DM6437. Note the extensive list of parameters offered for each peripheral.
Avnet SpeedWay Workshops
24
24Avnet SpeedWay Design Workshop™
Control
DSPCore
VLYNQ Co-ProcessorVLYNQLOGICORE
VPBE
VPFE VPFE INTERFACE
VPBE INTERFACE
Video
Video
Avnet Board Support Package / VPSS
• VPSS blocks used by automatic code-generation to call DSP/BIOS driver APIs
How is the VPSS connectivity accomplished ?
This is accomplished with Avnet BSP for Simulink, developed in collaboration with The MathWorks. For code generation, the VPFE and VPBE blocks are used by RTW Embedded Coder to call the DSP/BIOS driver API.
Avnet SpeedWay Workshops
25
25Avnet SpeedWay Design Workshop™
ControlDSPCore
VLYNQ Co-ProcessorVLYNQLOGICORE
VPFE VPFE INTERFACEVideo
Avnet Board Support Library / VLYNQ
• VLYNQ block used by automatic code-generation to call VLYNQ DSP/BIOS driver API
How is the VLYNQ connectivity accomplished on the DSP side ?
This is accomplished with Avnet BSP for Simulink, developed in collaboration with The MathWorks. For code generation, the VLYNQ block is used by RTW Embedded Coder to call the DSP/BIOS driver API.
(Recall directory structure of Avnet BSP from lecture 4)
Avnet SpeedWay Workshops
26
26Avnet SpeedWay Design Workshop™
Co-Processor
FIFO
REG
RAM
Memory-Mapped IO
0400:000004000001
0800:00000800:00FF0800:01000801:00FF
MATLAB® and Simulink®
Algorithm and System DesignMATLABMATLAB®® and Simulinkand Simulink®®
Algorithm and System DesignAlgorithm and System Design
Real-Time WorkshopEmbedded Coder,
Targets, Links
RealReal--Time WorkshopTime WorkshopEmbedded Coder,Embedded Coder,
Targets, LinksTargets, Links
Verif
y
Generate
Generate
Code Composer
Avnet Spartan3A-DSP FPGA / DaVinci Platform
C / ASM
XilinxXilinxXilinx
MathWorksMathWorksMathWorks
Link for CCSLink for CCS
Xilinx System Generator for DSP
Xilinx System Xilinx System Generator for DSPGenerator for DSP
HDL
ISEISE
TITITI
AvnetAvnetAvnet
DSPDSP FPGAFPGAVLYNQ
REG
FIFO
RAM
Memory Map
Export memory map via MATLAB
Passing FPGA Memory Map via MATLAB…
DaVinci processor
Shared memories in the System Generator model destined for the FPGA co-processor are associated with the DM6437 through the ‘DaVinci Processor’ VLYNQ Interface block’s GUI in System Generator. After an association is made, System Generator automatically generates a memory map of all shared memory in the model.
<mouse click>
During code generation, the memory map is exported to Code Composer Studio via the MATLAB workspace to create memory-mapped IO in DM6437 that communicate with corresponding registers, FIFOs and RAM elements in the FPGA co-processor over VLYNQ.
<mouse click>
On the FPGA side, System Generator project integration with ISE carries memory mapping information to the VLYNQ IP in ISE, where the final bitstream is created.
<mouse click>
The result is an association between memory-mapped IO space in the DM6437 and registers, FIFOs and RAM memory elements in the FPGA co-processor, which appear to the DM6437 as local memory space through VLYNQ.
Push-button automatic code generation removes all the grunt work of manually maintaining the API, including memory-maps, function headers and C-code device drivers in Code Composer Studio.
Avnet SpeedWay Workshops
27
27Avnet SpeedWay Design Workshop™
DSP design
FPGA design
Memory Mapcommunicatedvia MATLAB
Implementing DSP to FPGA VLYNQ Interface
Here we show usage of the DM6437 VLYNQ Interface blocks in Simulink to connect DM6437 in the top windows to the FPGA co-processor in System Generator in the bottom windows. Note the memory-mapping for a single shared register passed via the MATLAB workspace.
Avnet SpeedWay Workshops
28
28Avnet SpeedWay Design Workshop™
TC6 Automatic Code Generation for DM6437
• VLYNQ DSP/BIOS driver API created by automatic code-generation from Avnet BSP VLYNQ block
Excerpt of auto-generated code from The MathWorks Embedded Coder for TC6 from VLYNQ block in Avnet board support library for Simulink.
Avnet SpeedWay Workshops
29
29Avnet SpeedWay Design Workshop™
• Various FPGA infrastructure on different clock domains• Multiple Subsystem Generator allows multiple asynchronous clock
domains in one System Generator model
Clock Domains in System Generator
Multiple clock domains are handled seamlessly by the Avnet board support package using a powerful feature of System Generator: Multiple Subsystem Generator.
This example shows VLYNQ interface to DSP on one clock domain, VPFE for incoming video one another clock domain, and VPBE for video display on a third clock domain.
Note that the top-level FPGA design is finalized in ISE after project export from System Generator.
Avnet SpeedWay Workshops
30
30Avnet SpeedWay Design Workshop™
Avnet Board Support Package / Demos
• Suite of demos integrated into board support package• FPGA-based co-processors using model based design
A comprehensive suite of demos is integrated into the Simulink board support package for the Avnet Spartan-3A DSP FPGA DaVinci Development Kit. Demos cover these aspects of creation of FPGA-based co-processors using model based design:
•LCD Demo: generate an image on the LCD panel of the Avnet Spartan-3A DSP FPGA DaVinci Development Kit
•Resizer demo: demonstrates two methods for resizing an image
•NTSC to LCD passthrough: demonstrates how to implement a NTSC to LCD passthrough
•SVGA to LCD passthrough: demonstrates how to implement a SVGA to LCD passthrough
•Video surveillance recording: demonstrates a video surveillance recording application with motion-detection algorithm on the DM6437 DSP
•LED Demo: Using a very simple example, a model-based design is gradually targeted to DSP and FPGA hardware.
Avnet SpeedWay Workshops
31
31Avnet SpeedWay Design Workshop™
Avnet Design Resource Center
• Download Board Support Package for Simulink from DRC
Avnet SpeedWay Workshops
32
32Avnet SpeedWay Design Workshop™
Stand-Alone Video Stabilization System
XGA Flat
Panel
Avnet Xilinx Spartan3A-DSP DaVinci Evaluation Platform
DDR2
Video source
NTSC
ImageTranslate
Best-match row,column
VLYNQ
RGB24-bits
VPBE
Flat PanelController
VLYNQLOGICORE SAD
2XScaler
VPFEScaler VPFE
INTERFACE
1024 x 76860 Hz
Template,ROI
VPBE INTERFACE
Block diagram of stand-alone video stabilization system that will be built in lab 5. The architecture of the Avnet Spartan-3A-DSP DaVinci board routes video data through the FPGA towards the DM6437 over the dedicated VPFE video port.
Template and ROI data are sent to the FPGA at each frame for SAD search of template in region of interest (ROI). Best-match result of SAD is sent back to DM6437 over VLYNQ.
Motion vector is used as offset for image translation to stabilize the video from frame-to-frame. Video output is sent over VPBE to FPGA for display on XGA flat panel.
Avnet SpeedWay Workshops
33
33Avnet SpeedWay Design Workshop™
MATLAB® and Simulink®
Algorithm and System DesignMATLABMATLAB®® and Simulinkand Simulink®®
Algorithm and System DesignAlgorithm and System Design
Verif
y
Real-Time WorkshopEmbedded Coder,
Targets, Links
RealReal--Time WorkshopTime WorkshopEmbedded Coder,Embedded Coder,
Targets, LinksTargets, Links
Verif
y
Generate
Generate
Code Composer
Avnet Spartan3A-DSP FPGA / DaVinci Platform
C / ASM
XilinxXilinxXilinx
MathWorksMathWorksMathWorks
Link for CCSLink for CCS
Xilinx System Generator for DSP
Xilinx System Xilinx System Generator for DSPGenerator for DSP
HDL
ISEISE
Hardware Hardware CoCo--simulationsimulation
TITITI
AvnetAvnetAvnet
DSPDSP FPGAFPGA ChipscopeChipscopeChipscopeVerifyVideo source
LCDPanel
VLYNQ
Integrating the DSP and FPGA Co-processor…
Preview of lab 5:
< mouse click >
1. Implement connectivity in System Generator for data transfer between the DM6437 and FPGA co-processor over VLYNQ.
< mouse click >
2. Continue with automatic code generation of executables for both DSP and FPGA, including the Avnet board support package for Simulink on Avnet Spartan-3A DSP DaVinci development Kit.
< mouse click >
3. Conclude with in-system verification techniques of the combined the DSP and FPGA co-processor system.
Hardware co-simulation was used for functional verification in lab 4. It is not used for stand-alone implementation, and is shown here as reference only.
Note that video now flows into the system from a live source, contrary to video frames generated by Simulink for hardware co-simulation.
Avnet SpeedWay Workshops
34
34Avnet SpeedWay Design Workshop™
Summary
• Interfacing the DSP and FPGA Co-Processor
• Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module
• Model-Based Infrastructure for Stand-Alone Implementation
… proceed to lab 5 Integrating the DSP and FPGA Co-processor
Avnet SpeedWay Workshops
36
36Avnet SpeedWay Design Workshop™
Video Processing SubsystemDSP
Core
VLYNQ Data Flow
Remote VLYNQLocal VLYNQ
Cust
om In
terfa
ce
Co-Processor
VLYNQ block diagram.
The previous slide showed memory mapping between the local (host) device’s address space and the remote address space. This is accomplished via the address translation blocks. A remote VLYNQ device is mapped to the local device’s address via the address map registers (TX address map, RX address map size n, RX address map offset n, where n = 1to 4). For clarity, the map registers aren’t shown on the block diagram above.
The data flow between two VLYNQ devices is shown here, in which the write originates from the DM643x slave configuration bus interface towards the outbound command (CMD) FIFO after address translation. Data is subsequently read from the FIFO and encapsulated in a write request packet. The packet is encoded and serialized before being transmitted to the remote VLYNQ in the FPGA.
The remote device subsequently de-serializes and decodes the receive data and writes it into the inbound CMD FIFO. A write operation initiates on the FPGA VLYNQ OPB master bus interface (On-Chip Peripherial Bus) after reading the address and data from the FIFO. 32-bit OPB interface standard can interface directly to an embedded processor in the FPGA, or a custom user interface, as shown.
Finally, address decoding can deliver the data to register(s) of the addressed peripheral.
The Xilinx VLYNQ serial interface is not directly coupled to the OPB interface; there are asynchronous FIFOs between the two interface domains, and the interfaces operate independently. However, if the OPB fails to generate sufficient commands and data to consume all the VLYNQ interface’s bandwidth, the VLYNQ interface generates idle packets. If the OPB fails to immediately accept all remotely generated commands and data, the FIFOs fill and the VLYNQ interface turns flow control on.
Reference:TMS320DM643x DMP VLYNQ Port User's GuideLiterature Number: SPRU938BSection 2.5.1
Xilinx VLYNQ v1.3 / Core Generator 10.1Literature Number: DS324
Avnet SpeedWay Workshops
37
37Avnet SpeedWay Design Workshop™
http://focus.ti.com/lit/ug/spru938b/spru938b.pdf
www.xilinx.com/products/ipcenter/DO-DI-VLYNQ.htm
VLYNQ References
VLYNQ documentation consists of the TMS320DM643x DMP VLYNQ Port User’s Guide from TI and of the VLYNQ LogiCore datasheet from Xilinx.
Avnet SpeedWay Workshops
38
38Avnet SpeedWay Design Workshop™
vlynq_config.peer_tx_addr = 0;vlynq_config.local_rtm_cfg_type = no_rtm_cfg;vlynq_config.peer_rtm_cfg_type = no_rtm_cfg;vlynq_config.local_tx_fast_path = FALSE;vlynq_config.peer_tx_fast_path = FALSE;
/* Initialize the VLYNQ control module */ptr_vlynq = PAL_sysVlynqInitSoc(&vlynq_config);
if(NULL == ptr_vlynq){
VLYNQ_DEBUG("VLYNQ :Failed to initialize the vlynq 0x%08x\n\r",vlynq_config.base_addr);
VLYNQ_DEBUG("VLYNQ :The error msg: %s\n\r", vlynq_config.error_msg);goto av_vlynq_init_fail;
}
/* Map memory regions of device for remote/local VLYNQ depending on region ID to be mapped and the size and offset. */while(init_p_region->id > -1){
if(VLYNQ_APP_SUCCESS != PAL_sysVlynqMapRegion(ptr_vlynq, init_p_region->remote, init_p_region->id,init_p_region->offset, init_p_region->size, ptr_vlynq_dev))
VLYNQ DSP/BIOS Driver
On the TI SOC software side, a VLYNQ peripheral is implemented using a set of functions within the API (application programming interface) provided by the VLYNQ device driver.
Shown above are 2 of the preparatory steps to activate VLYNQ: PAL_sysVlynqInitSoc to initialize the VLYNQ control module, and PAL_sysVlynqMapRegion to map memory regions of the device for remote/local VLYNQ depending on the region ID to be mapped and the size and offset.
Refer to VLYNQ Device Driver architecture for a full description of all functions in the API.
Avnet SpeedWay Workshops
39
39Avnet SpeedWay Design Workshop™
Avnet Tools:- avnet_3adsp_dm6437_0_04
AVNET_S3ADSP_DM6437_INSTALL_DIR => C:\avnet_s3adsp_dm6437_0_04PSP_EVMDM6437_INSTALLDIR => %AVNET_S3ADSP_DM6437_INSTALL_DIR%\pspCSLR_DM6437_INSTALLDIR => %AVNET_S3ADSP_DM6437_INSTALL_DIR%\psp\pspdrivers\soc\dm6437\dsp\inc
Modified version of C:\dvsdk_1_01_00_15\psp_1_00_02_00Modified version of C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval
DSP drivers(CCS specific)
FPGA logic(ISE specific)
DSP blockset(Target Support Package TC6 & Embedded IDE Link CC specific)
FPGA blockset (SysGen specific)
Avnet BSP Installation Package
Once installed, the Avnet Spartan-3A DSP DaVinci board support package consists of the above directory structure.
Note:
•NDK = Modified-for-Avnet version of DVSDK for TI DM6437 EVM : C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval
•PSP = Modified-for-Avnet version of PSP DSP/BIOS drivers for TI DM6437 EVM : C:\dvsdk_1_01_00_15\psp_1_00_02_00
Avnet SpeedWay Workshops
40
40Avnet SpeedWay Design Workshop™
DSP blockset(Target Support Package TC6 & Embedded IDE Link CC specific)
FPGA blockset (System Generator specific)
DSP drivers (Code Composer Studio specific)
FPGA logic (ISE specific)
Spartan-3A DSP DaVinci Board Support Package
Network Devloper’s Kit (DSP/BIOS)PSP Drivers for DM6437 (DSP/BIOS)
Ethernet Hardware Co-Simulation support files
Once installed, the Avnet Spartan-3A DSP DaVinci board support package consists of the above directory structure. We concentrate here on Ethernet hardware co-simulation support files. All other components of the BSP will be presented in lecture 5.
Note:
•NDK = Modified-for-Avnet version of DVSDK for TI DM6437 EVM : C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval
•PSP = Modified-for-Avnet version of PSP DSP/BIOS drivers for TI DM6437 EVM : C:\dvsdk_1_01_00_15\psp_1_00_02_00
Avnet SpeedWay Workshops
41
41Avnet SpeedWay Design Workshop™
Ethernet Hardware Co-Simulation Support Files
• Board appears in list of targets for Ethernet hardware co-simulation
Avnet provides Ethernet hardware co-simulation support files for the Spartan-3A DSp DaVinci, as well as several Avnet Virtex-5 evaluation kits. The support files, known as ‘plugins’ are packaged in a standard format for the System Generator plugin installer ‘xlinstallplugin’. Once installed under the directory tree shown here, the board appears in the target list for Ethernet point-to-point hardware co-simulation.
Avnet SpeedWay Workshops
42
Accelerating Your Success™
V10_1_2_0
Installation PackageBSL – Board Support LibrariesMSL – Model Support Libraries
LED Demo
Avnet SpeedWay Workshops
43
43Avnet SpeedWay Design Workshop™
bsl\dsp\gel:- avnet_s3adsp_dm6437.ccs => CCS setup for BlackHawk USB510L- avnet_s3adsp_dm6437.gel => GEL file for Avnet board
bsl\dsp\src:bsl\dsp\inc:- dm6437_init.c/.h => various init/config routines- fpga_interface.c/.h => FPGA device driver (apply/release reset)- vlynq_interface.c/.h => VLYNQ device driver- led_interface.c/.h => LED device driver- dip_interface.c/.h => DIP Switch device driver- vpss_interface.h => contains a bunch of useful defines
bsl\dsp\dspbios:- Platform.tci => ??
BSL – DSP drivers
Avnet SpeedWay Workshops
44
44Avnet SpeedWay Design Workshop™
bsl\fpga\rtl:- pattern => XGA pattern generator (color bars + moving logo)- lcd => LCD flat panel interface- picoblaze => picoblaze-based I2C controller- vlynq => VLYNQ interface core- video => video interfaces (stddef, hidef, vpfe, vpbe)- debug => ChipScope debug module- top_level => top level designs
bsl\fpga\chipscope:- ChipScope Analyzer project for FPGA debug
bsl\fpga\ucf:- constraints file for FPGA designs
bsl\fpga\ise- davinci_coprocessor_stddef => example design for Composite input- davinci_coprocessor_hidef => example design for VGA input
BSL – FPGA Logic
Avnet SpeedWay Workshops
45
Accelerating Your Success™
V10_1_2_0
Installation PackageBSL – Board Support LibrariesMSL – Model Support Libraries
LED Demo
Avnet SpeedWay Workshops
46
46Avnet SpeedWay Design Workshop™
DIP Switch:- Reads one of SW10[1:4] switches (cannot be used with VPFE/VPBE)LED:- Writes to one of D7, D8, D9, D10 LEDsVLYNQ Read/Write:- Reads/Writes to FPGA peripherals via VLYNQ
MSL – DSP Logic
Avnet SpeedWay Workshops
47
47Avnet SpeedWay Design Workshop™
DaVinci Processor:- similar to Xilinx’s EDK Processor block- automatically creates VLYNQ bus logic to all shared regs/fifos/mems- creates memory map
I2C Controller:- PicoBlaze-based I2C Controller- Command Port via request/response FIFOs
MSL – FPGA Blockset
Avnet SpeedWay Workshops
48
48Avnet SpeedWay Design Workshop™
VLYNQbus logic
Automaticallycreated
FPGA design
Implementing DSP to FPGA VLYNQ Interface
Memories used in the co-processor are associated with the DaVinci processor through the block’s GUI interface in system Generator.
After an association is made, System Generator automatically generates an interface that marshals data to and from the processor over VLYNQ. On the DaVinci side Target for C6000 handles automatic code generation. Having the control and processor in the same development environment removes all the grunt work of manually maintaining the API, including memory-maps, function headers and C-code device drivers in Code Composer Studio.
Avnet SpeedWay Workshops
49
Accelerating Your Success™
V10_1_2_0
Installation PackageBSL – Board Support LibrariesMSL – Model Support Libraries
LED Demo
Avnet SpeedWay Workshops
50
50Avnet SpeedWay Design Workshop™
LED Demo – DIP Implementations
forsimulation
only
forDSP build
Avnet SpeedWay Workshops
51
51Avnet SpeedWay Design Workshop™
LED Demo – LED Implementations
forsimulation
only
forDSP build
Avnet SpeedWay Workshops
54
54Avnet SpeedWay Design Workshop™
Serial RapidIO™ Enables Increased Bandwidth (TI TMS320C6455, C6474, etc.)
•C6455 Serial RapidIO Support – IEEE 1149.6 Compliant– 1.25, 2.5, 3.125 GBit/sec per link
Up to four 1x links (each 1x link is bidirectional) --OR--Up to one 4x link (bi-directional pipe), which provides up to 12.5 GBit/sec
– Resulting range 10 – 25 GBits/sec total (1.25 – 3.125 GBytes/sec)– Supports DSP-to-DSP on the same board, DSP-to-Switch, DSP-to-FPGA,
etc.•Benefits– 1x Link is fast enough to send HD 1080i raw video between devices– 4x Link is easily fast enough to send HD 1080p raw video between devices– Reduction in chip count, board area and system cost
Serial RapidIO is a high-performance, packet-switched, interconnect technology that addresses the embedded industry's need for:
Serial RapidIO allows chip-to-chip and board-to-board communications at performance levels scaling to ten Gigabits per second and beyond
Reliability Increased Bandwidth Faster Bus Speeds
TI customers asked for faster IO performance. TI listened. TI are bus agnostic. So, let’s first explain why did TI choose Serial Rapid IO for C6455:
High Performance for HD video and Telecom Channel DensityWorldwide standard, Multiple applications, broad OEM adoptionFlexible / scaleable rates and widths (1x or 4x)Low pin count and Low power per link
TI was part of the consortium that defined the standard with other industry leaders.
The theoretical payload bandwidth is up to 25Gbits/sec, but there is some overhead (addresses, acknowledgement, error correction) with any communications protocol. (reality may be ~19 or 20 Gbits/sec)
From a video infrastructure applications perspective, the 1x Link is fast enough to send HD 1080i raw video between devices and the 4x link can easily send HD 1080p raw video between devices. The use of SRIO in infrastructure applications with large “DSP farms” may allow the reduction of FPGA cost (quantity, pin count, size and/or cost) for our OEMs.