Date post: | 18-Aug-2015 |
Category: |
Devices & Hardware |
Upload: | hans-kuo |
View: | 43 times |
Download: | 2 times |
2
OUTLINE
Introduction to DSP Processor C6000 Architecture C6000 Memory Map Homework 1
3
OUTLINE
Introduction to DSP Processor C6000 Architecture C6000 Memory Map Homework 1
Silicon Solutions
Decision table for designers of real-time
“Choosing the Right Architecture for Real-Time Signal Processing Designs”, Leon Adams, Texas Instruments
4
Programmability : GPP > DSP > FPGA > ASIC Performance : ASIC > FPGA > DSP > GPP Example : Wireless communication
GPP : OS, Network Protocol DSP : A/V Codec ASIC, FPGA : Reed Solomon, Viterbi decoder
Evaluating Category ASIC FPGA DSP GPP
Programmability 1 4 5 5
Development Cycle 2 3 4 5
Performance 5 5 4 2
Power consumption 4 2 2 2
GPP : general-purpose processor DSP : digital signal processorFPGA : field programmable gate arrayASIC : application specific IC
Silicon Solutions
5
Ti Embedded Processors
32-bitReal-time
32-bit ARM (MCU)
ARM M3/M4
Industry StdLow Power
<100 MHz
Flash64 KB to 1 MB
USB, ENET, ADC, PWM, SPI
Host Control
$2.00 to $8.00
16-bit
Microcontrollers
MSP430
Ultra-Low Power
Up to 25 MHz
Flash1 KB to 256 KB
Analog I/O, ADCLCD, USB, RF
Measurement,Sensing, General
Purpose
$0.49 to $9.00
DSPs
C647x, C64x+, C674x, C55x
Leadership DSP Performance
24,000 MMACS
Up to 3 MB L2 Cache
1G EMAC, SRIO,DDR2, PCI-66
Comm, WiMAX, Industrial/
Medical Imaging
$4.00 to $99.00+
ARM(MPU)
ARM9Cortex A-8
Industry-Std Core,High-Perf GPP
Accelerators
MMU
USB, LCD,MMC, EMAC
Linux/WinCE User Apps
$8.00 to $35.00
DSP
DaVinci, OMAP
Industry-Std Core +DSP for Signal Proc.
4800 MMACs/1.07 DMIPS/MHz
MMU, Cache
VPSS, USB, EMAC, MMC
Linux/Win +Video, Imaging,
Multimedia
$12.00 to $65.00
ARM + DSP
ARM-Based
C2000™
Fixed & Floating Point
Up to 300 MHz
Flash32 KB to 512 KB
PWM, ADC, CAN, SPI, I2C
Motor Control, Digital Power,
Lighting, Sensing
$1.50 to $20.00
6
7
DSP Applications
8
Why do we need DSP processors?
The Sum of Products (SOP) or Multiply-accumulate(MAC) is the key element in most DSP algorithms:
Algorithm Equation
Finite Impulse Response Filter
M
kk knxany
0
)()(
Infinite Impulse Response Filter
N
kk
M
kk knybknxany
10
)()()(
Convolution
N
k
knhkxny0
)()()(
Discrete Fourier Transform
1
0
])/2(exp[)()(N
n
nkNjnxkX
Discrete Cosine Transform
1
0
122
cos).().(N
x
xuN
xfucuF
9
Hardware vs. Software multiplication
DSP processors are optimized to perform multiplication and addition operations.
Multiplication and addition are done in hardware and in one cycle.
Example: 4-bit multiply (unsigned).
1011x 1110
1011x 1110
Hardware Software
10011010 00001011.1011..
1011...
10011010
Cycle 1Cycle 2Cycle 3Cycle 4
Cycle 5
10
OUTLINE
Introduction to DSP Processor C6000 Architecture C6000 Memory Map Homework 1
11
C6000 System Block Diagram
PERIPHERALS
Internal Memory
Internal Buses
ExternalMemory
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs (B
0-B15)
Regs (A
0-A15)
Control Regs
CPU
12
C6000 Central Processing Unit
PERIPHERALS
Internal Memory
Internal Buses
ExternalMemory
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs (B
0-B15)
Regs (A
0-A15)
Control Regs
CPU
13
Implementation of Sum of Products (SOP)
SOP is the key element for most DSP algorithms.
let’s write the code for this algorithm and at the same time discover the C6000 architecture.
The implementation in this module will be done in assembly.
Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required
Y =N
å an xnn = 1
*
= a1 * x1 + a2 * x2 +... + aN * xN
14
Multiply (MPY)
The multiplication of a1 by x1 is done in assembly by the following instruction:
MPY a1, x1, Y
This instruction is performed by a multiplier unit that is called “.M”
Y =N
å an xnn = 1
*
= a1 * x1 + a2 * x2 +... + aN * xN
15
Multiply (.M unit)
.M.M
Y =40
å an xnn = 1
*
The . M unit performs multiplications in hardware
MPY .M a1, x1, Y
16
Addition (.?)
.M.M
.?.?
Y =40
å an xnn = 1
*
MPY .M a1, x1, prod
ADD .? Y, prod, Y
17
Add (.L unit)
.M.M
.L.L
Y =40
å an xnn = 1
*
MPY .M a1, x1, prod
ADD .L Y, prod, Y
C6000 use registers to hold the operands, so lets change this code.
18
Register File - A
Y =40
å an xnn = 1
*
MPY .M a1, x1, prod
ADD .L Y, prod, Y
.M.M
.L.L
A0A1A2A3A4
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
Let us correct this by replacing a, x, prod and Y by the registers as shown above.
19
Specifying Register Names
Y =40
å an xnn = 1
*
MPY .M A0, A1, A3
ADD .L A4, A3, A4
Register File A contains 16 registers (A0 -A15) which are 32-bits wide.
.M.M
.L.L
A0A1A2A3A4
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
20
Data loading
Q: How do we load the operands into the registers?
.M.M
.L.L
A0A1A2A3A4
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
21
Load Unit “.D”
.M.M
.L.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
.D.D
Data Memory
A: The operands are loaded into the registers by loading them from the memory using the .D unit.
Q: How do we load the operands into the registers?
Q: Which instruction(s) can be used for loading operands from the memory to the registers?
A: The load instructions.
(LDB, LDH,LDW,LDDW)
22
Using the Load Instructions
Y =40
å an xnn = 1
*
LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
.M.M
.L.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
.D.D
Data Memory
23
Creating a loop
So far we have only implemented the SOP for one tap only, i.e.
Y= a1 * x1
So let’s create a loop so that we can implement the SOP for N Taps.
Y =40
å an xnn = 1
*
LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
24
Create a label to branch
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
Y =40
å an xnn = 1
*
25
Add a branch instruction, B.
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4 B .? loop
Y =40
å an xnn = 1
*
26
Which unit is used by the B instruction?
.S.SY =
40
å an xnn = 1
*
.M.M
.L.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1x1
prod
32-bits
Y
.D.D
Data Memory
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4 B .S loop
27
How can we add more processing power to this processor?
.S.S
.M.M
.L.L
A0
A1
A2
A3
A15
Register File A
.
.
.
32-bits
.D.D
Data Memory
(1 ) Increase the clockfrequency.
(2 ) Increase the number of Processing units.
28
Increase the number of Processing units
.S.S
.M.M
.L.L
A0
A1
A2
A3
A15
Register File A
.
.
.
32-bits
.D.D
Data Memory
.S2.S2
.M2.M2
.L2.L2
.D2.D2
B0
B1
B2
B3
B15
Register File B
.
.
.
32-bits
29
C6211 Instruction Set (by unit)
.S Unit
MVKLHNEGNOT ORSETSHLSHRSSHLSUBSUB2XORZERO
ADDADDKADD2ANDBCLREXTMVMVCMVKMVKLMVKH
.M Unit
SMPYSMPYH
MPYMPYH
.L Unit
NOTORSADDSATSSUBSUBSUBCXORZERO
ABSADDANDCMPEQCMPGTCMPLTLMBDMVNEGNORM
.D Unit
STB/H/WSUBSUBAZERO
ADDADDALDB/H/WMVNEG
Other
IDLENOP
30
C language vs Assembly
HandOptimize
AssemblyOptimizer
CompilerOptimizer
Source Efficiency Effort
C
LinearASM
ASM
70-100%
95-100%
100%
Low
Med
High
31
'C6x Peripherals
Internal Memory
Internal Buses
ExternalMemory
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs (B
0-B15)
Regs (A
0-A15)
Control Regs
CPU
PERIPHERALS
32
'C6x Peripherals
EMIF (External Memory Interface)
- Glueless access to async/sync memory
EPROM, SRAM, SDRAM, SBSRAM
DMA/EDMA (Enhance Direct Memory Acces)
- 4/16 Channels
BOOT
- Boot from 4M external block
- Boot from HPI/XB
‘C6x
CPU
‘C6x
CPU
EMIFEMIF
DMADMA
BootBoot
ExternalMemory
McBSPMcBSP
HPI/XBHPI/XB
TimerTimer
PLLPLL
McBSP (Multi-Channel Buffered
Serial Port) - High speed sync serial comm
- T1/E1/MVIP interface
HPI (Host Port Interface)
/Expansion Bus (XB)- 16/32-bit host P access
Timer/Counters- Two 32-bit Timer/Counters
33
OUTLINE
Introduction to DSP Processor C6000 Architecture C6000 Memory Map Homework 1 Reference
34
C6000 Memory
PERIPHERALS
Internal Memory
Internal Buses
ExternalMemory
.D1
.M1
.L1
.S1
.D2
.M2
.L2
.S2
Regs (B
0-B15)
Regs (A
0-A15)
Control Regs
CPU
35
C6416 Memory Map
FFFF_FFFF
0000_0000 1024KB Internal (L2 cache)
Internal Memory Unified (data or prog) 1024KB
On-chip Peripherals0180_0000
External Memory Async (SRAM, ROM, etc.) Sync (SBSRAM, SDRAM)
6000_0000
8000_0000
EMIFB 64MB x 4 External
Level 1 Cache 16KB Program 16KB Data Not in map CPU L2
1024K
16KP
16KD
EMIFA 256MB x 4 External
36
Memory Allocation
C source code
CompilerAssmebler
COFFObject file
Text
Data
Bss
COFFObject file
ROM
External RAM
Internal RAM
Target Memory0x00000
0xfffff
SECTION
Stack
Heap
Text
Data
Bss
MEMORY
Memory Layout
MEMORY { ISRAM : origin = 0x00000000, len = 0x00100000}SECTIONS{ .text > ISRAM}
37
What is stored in memory ?
What is stored in memory ? Code Constants Global and static variables Local variables Dynamic memory
Memory 0x00000
0xfffff
38
How is memory organized?
How is memory organized? text : Code and constant data data : Initialized global and
static variables bss : Unintialized global and
static variables stack :
Local variables Function return addresses Arguments of function
heap : Dynamic memory
Memory 0x00000
0xfffff
stack
heap
bss
data
text
39
How is memory allocated?
How is memory allocated ?
long array[100];long bufsize =100;int main(void) { int i; char* buf; i=10; buf=f1(i); return(0);}
Char* f1(int n){ int k; Return malloc(bufsize);}
Memory 0x00000
0xfffff
heapbssdata
text
stack
100 byte block
array[100]
bufsize = 100
int main(void) { i=10; buf=f1(i); return(0);} …
Main return addressibuff1 argument nf1 return addressk
40
Memory Allocation & Deallocation
How, and when , is memory allocated? Gobal and static variables = program startup Local variables = function call Dynamic memory = malloc()
How, and when, is memory deallocated? Global and static variables = program finish Local variables = function return Dynamic memory = free()
41
When is memory allocated?
long array[100];long bufsize =100;int main(void) { int i; char* buf; i=10; buf=f1(i); return(0);}
Char* f1(int n){ int k; Return malloc(bufsize);}
bss : 0 at startupdata : 100 at startup
Stack : at function call
Stack : at function call
Heap : 100 bytes at malloc()
42
When is memory deallocated?
long array[100];long bufsize =100;int main(void) { int i; char* buf; i=10; buf=f1(i); return(0);}
Char* f1(int n){ int k; Return malloc(bufsize);}
Available till terminationAvailable till termination
Deallocate on return from main()
Deallocate on return from f1()
Deallocate on free()
43
Sections defined in C6000 compiler
Initialized sections .cinit : Initial values for global/static variables .const : Global and static string literals .switch : Tables for switch instructions .text : code
Uninitialized sections .bss : Global and static variables .stack : Stack(local variables, return address, arguments) .far : Global and statics declared far .sysmem : Memory for malloc functions (heap)
44
Example : 6416 DSK
16MB512KB
45
Example : C6416 DSK
Base Length
Internal Memory 0x00000000 0x00100000 (1024K)
External SDRAM 0x80000000 0x01000000(16M)
External Flash 0x64000000 0x00080000 (512K)
46
Linker command file (*.cmd)
MEMORY Directive System memory description Name : origin = address, length = size-in-bytes
MEMORY{ ISRAM : origin = 0x00000000, len = 0x00100000 SDRAM : origin = 0x80000000, len = 0x01000000 FLASH : origin = 0x64000000, len = 0x00080000}
47
Linker command file (*.cmd)
SECTIONS Directive Binding sections to memory
SECTIONS{ .text > ISRAM .bss > ISRAM .cinit > ISRAM …}
48
C6416.cmd
-stack 0x400MEMORY{ ISRAM : origin = 0x00000000, len = 0x00100000 SDRAM : origin = 0x80000000, len = 0x01000000 FLASH : origin = 0x64000000, len = 0x00080000}SECTIONS{ .text > ISRAM .bss > ISRAM .cinit > ISRAM .stack > ISRAM …}
49
DSP/BIOS Configure Tool (*.cdb)
ISRAM Properties
System memory description
50
DSP/BIOS Configure Tool (*.cdb)
Properties
Binding sections to memory
Program Cases :
Case 1 :
51
Void main(){ int Image[1000]; …. }
int Image[1000];Void main(){ …. }
stack = ?
stack 0x400 (1024)
Program Cases :
Case 2 :
52
Void main(){ double Image[200000]; …. }
52
bss > SDRAM
stack 0x400 (1024)
bss < 0x100000 (1024k)double Image[200000];Void main(){ …. }
Q&A