Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | wesley-clarke |
View: | 253 times |
Download: | 2 times |
TMS320C6000 DSP Optimization Workshop
Chapter 10
Advanced Memory Management
Copyright © 2005 Texas Instruments. All rights reserved. Technical Training
Organization
T TO
Outline
Using Memory Efficiently Keep it on-chip Use multiple sections Use local variables (stack)
Using dynamic memory (heap, BUF)
Overlay memory (load vs. run)
Use cache Summary
Keep it On-Chip
InternalSRAM
CPU
ProgramCache
DataCache
EMIF
.text
.bss
Using Memory Efficiently
1. If Possible …
Put all code / data on-chip Best performance Easiest to implement
What if it doesn’t all fit?Technical Training
Organization
T TO
How to use Internal Memory Efficiently
1. Keep it on-chip
2. Use multiple sections
3. Use local variables
(stack)
4. Using dynamic memory
(heap, BUF)
5. Overlay memory
(load vs. run)
6. Use cache
Technical TrainingOrganization
T TO
Use Multiple Sections
InternalSRAM
CPU
ProgramCache
DataCache
EMIF
External Memory
Using Memory Efficiently
2. Use Multiple Sections Keep .bss (global vars) and
critical code on-chip Put non-critical code and
data off-chip
.text
.bss
.far
critical
myVar
Technical TrainingOrganization
T TO
Making Custom Code Sections
#pragma CODE_SECTION(dotp, “critical”);int dotp(a, x)
Create custom code section using
#pragma CODE_SECTION(dotp, “.text:_dotp”);
Use the compiler’s –mo option -mo creates a subsection for each function Subsections are specified with “:”
To make a data section ...Technical TrainingOrganization
T TO
Making Custom Data Sections
A special data section ...
#pragma DATA_SECTION (x, “myVar”);#pragma DATA_SECTION (y, “myVar”);int x[32];short y;
Make custom named data section
Technical TrainingOrganization
T TO
Special Data Section: “.far”
#pragma DATA_SECTION(m, “.far”)short m;
.far is a pre-defined section name Three cycle read (pointer must be set before read) Add variable to .far using:
1. Use DATA_SECTION pragma
2. Far compiler option
3. Far keyword:
How do we link our own sections?
-ml
far short m;
Technical TrainingOrganization
T TO
Linking Custom Sectionsapp.cdb
Linker
appcfg.cmd
myApp.out
“Build”
How do I know which CMD file is executed first?
myLink.cmd
SECTIONS { myVar: > SDRAM critical: > IRAM .text:_dotp:> IRAM}
Technical TrainingOrganization
T TO
Specifying Link Order
What if I forget to specify a section in SECTIONS?Technical TrainingOrganization
T TO
Check for Unspecified Sections
In summary …Technical TrainingOrganization
T TO
Use Multiple Sections
InternalSRAM
CPU
ProgramCache
DataCache
EMIF
External Memory
Using Memory Efficiently
.text
.bss
.far
critical
myVar
2. Use Multiple Sections Keep .bss (global vars) and
critical code on-chip Put non-critical code and
data off-chip Create new sections with:
#pragma CODE_SECTION
#pragma DATA_SECTION You must make your own
linker command file
Technical TrainingOrganization
T TO
Using Memory Efficiently
1. Keep it on-chip
2. Use multiple sections
3. Use local variables
(stack)
4. Using dynamic memory
(heap, BUF)
5. Overlay memory
(load vs. run)
6. Use cache
Technical TrainingOrganization
T TO
Dynamic Memory
InternalSRAM
CPU
ProgramCache
DataCache
EMIF
External Memory
Using Memory Efficiently
3. Local Variables If stack is located on-chip,
all functions can “share” it
Stack
What is a stack?Technical TrainingOrganization
T TO
Top of Stack
0
0xFFFFFFFF
What is the Stack
A block of memory where the compiler stores:
Local variables Intermediate results Function arguments Return addresses
Details of the C6000 stack ...Technical TrainingOrganization
T TO
(lower)
(higher)
stack grows
Details: 1. SP points to first empty location2. SP is double-word aligned before each fcn3. Created by Compiler’s init routine (boot.c)4. Length defined by -stack Linker option5. Stack length is not validated at runtime
SPB15
Top of Stack
0
0xFFFFFFFF
Stack and Stack Pointer
Technical TrainingOrganization
T TO
Dynamic Memory
InternalSRAM
CPU
ProgramCache
DataCache
EMIF
External Memory
Using Memory Efficiently
Stack
Heap
3. Local Variables If stack is located on-chip,
all functions can use it
4. Use the Heap Common memory reuse
within C language A Heap (ie. system memory)
allocate, then free chunks of memory from a common system block
For example …Technical TrainingOrganization
T TO
Dynamic Example (Heap)
#define SIZE 32
int x[SIZE]; /*allocate*/
int a[SIZE];
x={…}; /*initialize*/
a={…};
filter(…); /*execute*/
“Normal” (static) C Coding
#define SIZE 32
x=malloc(SIZE);
a=malloc(SIZE);
x={…};
a={…};
filter(…);
free(a);
free(x);
“Dynamic” C Coding
Create
Execute
Delete
High-performance DSP users have traditionally used static embedded systems As DSPs and compilers have improved, the benefits of dynamic systems often
allow enhanced flexibility (more threads) at lower costs
Dynamic Memory
InternalSRAM
CPU
ProgramCache
DataCache
EMIF
External Memory
Using Memory Efficiently
Stack
Heap
3. Local Variables If stack is located on-chip,
all functions can use it
4. Use the Heap Common memory reuse
within C language A Heap (ie. system memory)
can be allocated, then free’d
What if I need two heaps? Say, a big image array off-chip, and Fast scratch memory heap on-chip?
What if I need two heaps? Say, a big image array off-chip, and Fast scratch memory heap on-chip?
Technical TrainingOrganization
T TO
Multiple Heaps
InternalSRAM
CPU
ProgramCache
DataCache
EMIF
External Memory
Stack
Heap
Heap2
DSP/BIOS enables multiple heaps to be created
Multiple Heaps with DSP/BIOS DSP/BIOS enables multiple
heaps to be created
Check the box & set the size
when creating a MEM object
Multiple Heaps with DSP/BIOS DSP/BIOS enables multiple
heaps to be created
Check the box & set the size
when creating a MEM object
By default, the heap has the
same name as the MEM obj,
You can change it here
How can you allocate from multiple heaps?Technical Training
Organization
T TO
MEM_alloc()
#define SIZE 32
x = MEM_alloc(IRAM, SIZE, ALIGN);
a = MEM_alloc(SDRAM, SIZE, ALIGN);
x = {…};
a = {…};
filter(…);
MEM_free(SDRAM,a,SIZE);
MEM_free(IRAM,x,SIZE);
Using MEM functions
#define SIZE 32
x=malloc(SIZE);
a=malloc(SIZE);
x={…};
a={…};
filter(…);
free(a);
free(x);
Standard C syntax
You can pick a specific heap
Technical TrainingOrganization
T TO
BUF Concepts
Buffer pools contain a specified number of equal size buffers Any number of pools can be created Buffers are allocated from a pool and freed back when no longer needed Buffers can be shared between applications Buffer pool API are faster and smaller than malloc-type operations In addition, BUF_alloc and BUF_free are deterministic (unlike malloc) BUF API have no reentrancy or fragmentation issues
POOLBUF BUF BUF BUF BUF
SWI
BUF_alloc
BUF
TSK
BUF_free
BUF BUF BUF BUF
BUF_create BUF_delete
Technical TrainingOrganization
T TO
GCONF Creation of Buffer PoolCreating a BUF1. right click on BUF mgr2. select “insert BUF”3. right click on new BUF4. select “rename”5. type BUF name6. right click on new BUF7. select “properties”8. indicate desired • Memory segment• Number of buffers• Size of buffers• Alignment of buffers• Gray boxes indicate
effective pool and buffer sizes
Technical TrainingOrganization
T TO
Using Memory Efficiently
1. Keep it on-chip
2. Use multiple sections
3. Use local variables
(stack)
4. Using dynamic memory
(heap, BUF)
5. Overlay memory
(load vs. run)
6. Use cache
Technical TrainingOrganization
T TO
Use Memory Overlays
InternalSRAM
CPU
ProgramCache
DataCache
EMIF
External Memory
algo2
algo1
Using Memory Efficiently
5. Use Memory Overlays Reuse the same memory
locations for multiple algorithms (and/or data)
You must copy the sections yourself
First, we need to make custom sections?Technical TrainingOrganization
T TO
Create Sections to Overlay
#pragma CODE_SECTION(fir, “.FIR”);int fir(short *a, …)
#pragma CODE_SECTION(iir, “myIIR”);int iir(short *a, …)
myCode.C
How can we get them to run from the same location?
Where will they be originally loaded into memory?
The key is in the linker command file …
Technical TrainingOrganization
T TO
Load vs. Run Addresses
SECTIONS{ .FIR:> IRAM /*load & run*/ myIIR: load=IRAM, run=IRAM
InternalSRAM
External Memory
.fir
myIIR
Simply directing a section into a MEM obj indicates it’s both the load & run from the same location
.FIR:> IRAM
Alternatively, you could use:
.FIR: load=IRAM, run=IRAM In your own linker cmd file:
load: where the fxn resides at reset
run: tells linker its runtime location
What if we wanted them be loaded to off-chip but run from on-chip memory?
Load vs. Run Addresses
Simply specify different addresses for load and run
You must make sure they get copied (using the memcopy or the DMA)
loadaddresses
runaddresses
load: where the fxn resides at reset run: tells linker its runtime location
SECTIONS{ .FIR: load=SDRAM,run=IRAM myIIR: load=SDRAM,run=IRAM
Internal
SRAMExternal Memory
.FIR
myIIR
Back to our original problem, what if we want them to
run from the same address?
SECTIONS{ .FIR: load=SDRAM,run=IRAM myIIR: load=SDRAM,run=IRAM
Combining Run Addresses with UNION
Above, we only force different load/run
Below, we also force them to share (union) run locations
loadaddresses
runaddresses
SECTIONS{ UNION run = IRAM { .FIR : load = EPROM myIIR: load = EPROM }
Internal
SRAMExternal Memory
How can we make the overlay procedure easier?
SECTIONS{ UNION run = IRAM { .FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) }}
Using Copy Tables
SECTIONS{ UNION run = IRAM { .FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) }}
Using Copy Tables
typedef struct copy_record{ unsigned int load_addr;
unsigned int run_addr;unsigned int size;
} COPY_RECORD;
typedef struct copy_table{ unsigned short rec_size;
unsigned short num_recs;COPY_RECORD recs[2];
} COPY_TABLE;
fir_copy_table 31fir load addr
copy record fir run addrfir size
iir_copy_table 31iir load addr
copy record iir run addriir size
How do we use a Copy Table?Technical TrainingOrganization
T TO
SECTIONS{ UNION run = IRAM { .FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) }}
Using Copy Tables
#include <cpy_tbl.h>extern far COPY_TABLE fir_copy_table;extern far COPY_TABLE iir_copy_table;extern void fir(void);extern void iir(void);
main(){ copy_in(&fir_copy_table); fir(); ...
copy_in(&iir_copy_table); iir(); ...}
copy_in() provides a simple wrapper around mem_copy().
Better yet, use the DMA hardware to copy the sections; specifically, the DAT_copy() function.
copy_in() provides a simple wrapper around mem_copy().
Better yet, use the DMA hardware to copy the sections; specifically, the DAT_copy() function.
What could be even easier than using Copy Tables?What could be even easier than using Copy Tables?
Use Cache
InternalCache
CPU
ProgramCache
DataCache
EMIF
External Memory
.bss
.text
Using Memory Efficiently
6. Use Cache Works for Code and Data Keeps local (temporary)
scratch copy of info on-chip Commonly used, since once
enabled it’s automatic Discussed further in
Chapter 14
Technical TrainingOrganization
T TO
Summary: Using Memory Efficiently
You may want to work through your memory allocations in the following order:
1. Keep it all on-chip
2. Use Cache
(more in Ch 15)
3. Use local variables
(stack on-chip)
4. Using dynamic memory
(heap, BUF)
5. Make your own sections
(pragma’s)
6. Overlay memory
(load vs. run)
While this tradeoff is highly application dependent, this is a good place to start
Technical TrainingOrganization
T TO
ti
Technical TrainingOrganization