+ All Categories
Home > Documents > TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas...

TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas...

Date post: 13-Jan-2016
Category:
Upload: wesley-clarke
View: 253 times
Download: 2 times
Share this document with a friend
37
TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserve Technical Training Organization T TO
Transcript
Page 1: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

TMS320C6000 DSP Optimization Workshop

Chapter 10

Advanced Memory Management

Copyright © 2005 Texas Instruments. All rights reserved. Technical Training

Organization

T TO

Page 2: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Outline

Using Memory Efficiently Keep it on-chip Use multiple sections Use local variables (stack)

Using dynamic memory (heap, BUF)

Overlay memory (load vs. run)

Use cache Summary

Page 3: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Keep it On-Chip

InternalSRAM

CPU

ProgramCache

DataCache

EMIF

.text

.bss

Using Memory Efficiently

1. If Possible …

Put all code / data on-chip Best performance Easiest to implement

What if it doesn’t all fit?Technical Training

Organization

T TO

Page 4: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

How to use Internal Memory Efficiently

1. Keep it on-chip

2. Use multiple sections

3. Use local variables

(stack)

4. Using dynamic memory

(heap, BUF)

5. Overlay memory

(load vs. run)

6. Use cache

Technical TrainingOrganization

T TO

Page 5: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Use Multiple Sections

InternalSRAM

CPU

ProgramCache

DataCache

EMIF

External Memory

Using Memory Efficiently

2. Use Multiple Sections Keep .bss (global vars) and

critical code on-chip Put non-critical code and

data off-chip

.text

.bss

.far

critical

myVar

Technical TrainingOrganization

T TO

Page 6: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Making Custom Code Sections

#pragma CODE_SECTION(dotp, “critical”);int dotp(a, x)

Create custom code section using

#pragma CODE_SECTION(dotp, “.text:_dotp”);

Use the compiler’s –mo option -mo creates a subsection for each function Subsections are specified with “:”

To make a data section ...Technical TrainingOrganization

T TO

Page 7: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Making Custom Data Sections

A special data section ...

#pragma DATA_SECTION (x, “myVar”);#pragma DATA_SECTION (y, “myVar”);int x[32];short y;

Make custom named data section

Technical TrainingOrganization

T TO

Page 8: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Special Data Section: “.far”

#pragma DATA_SECTION(m, “.far”)short m;

.far is a pre-defined section name Three cycle read (pointer must be set before read) Add variable to .far using:

1. Use DATA_SECTION pragma

2. Far compiler option

3. Far keyword:

How do we link our own sections?

-ml

far short m;

Technical TrainingOrganization

T TO

Page 9: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Linking Custom Sectionsapp.cdb

Linker

appcfg.cmd

myApp.out

“Build”

How do I know which CMD file is executed first?

myLink.cmd

SECTIONS { myVar: > SDRAM critical: > IRAM .text:_dotp:> IRAM}

Technical TrainingOrganization

T TO

Page 10: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Specifying Link Order

What if I forget to specify a section in SECTIONS?Technical TrainingOrganization

T TO

Page 11: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Check for Unspecified Sections

In summary …Technical TrainingOrganization

T TO

Page 12: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Use Multiple Sections

InternalSRAM

CPU

ProgramCache

DataCache

EMIF

External Memory

Using Memory Efficiently

.text

.bss

.far

critical

myVar

2. Use Multiple Sections Keep .bss (global vars) and

critical code on-chip Put non-critical code and

data off-chip Create new sections with:

#pragma CODE_SECTION

#pragma DATA_SECTION You must make your own

linker command file

Technical TrainingOrganization

T TO

Page 13: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Using Memory Efficiently

1. Keep it on-chip

2. Use multiple sections

3. Use local variables

(stack)

4. Using dynamic memory

(heap, BUF)

5. Overlay memory

(load vs. run)

6. Use cache

Technical TrainingOrganization

T TO

Page 14: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Dynamic Memory

InternalSRAM

CPU

ProgramCache

DataCache

EMIF

External Memory

Using Memory Efficiently

3. Local Variables If stack is located on-chip,

all functions can “share” it

Stack

What is a stack?Technical TrainingOrganization

T TO

Page 15: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Top of Stack

0

0xFFFFFFFF

What is the Stack

A block of memory where the compiler stores:

Local variables Intermediate results Function arguments Return addresses

Details of the C6000 stack ...Technical TrainingOrganization

T TO

Page 16: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

(lower)

(higher)

stack grows

Details: 1. SP points to first empty location2. SP is double-word aligned before each fcn3. Created by Compiler’s init routine (boot.c)4. Length defined by -stack Linker option5. Stack length is not validated at runtime

SPB15

Top of Stack

0

0xFFFFFFFF

Stack and Stack Pointer

Technical TrainingOrganization

T TO

Page 17: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Dynamic Memory

InternalSRAM

CPU

ProgramCache

DataCache

EMIF

External Memory

Using Memory Efficiently

Stack

Heap

3. Local Variables If stack is located on-chip,

all functions can use it

4. Use the Heap Common memory reuse

within C language A Heap (ie. system memory)

allocate, then free chunks of memory from a common system block

For example …Technical TrainingOrganization

T TO

Page 18: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Dynamic Example (Heap)

#define SIZE 32

int x[SIZE]; /*allocate*/

int a[SIZE];

x={…}; /*initialize*/

a={…};

filter(…); /*execute*/

“Normal” (static) C Coding

#define SIZE 32

x=malloc(SIZE);

a=malloc(SIZE);

x={…};

a={…};

filter(…);

free(a);

free(x);

“Dynamic” C Coding

Create

Execute

Delete

High-performance DSP users have traditionally used static embedded systems As DSPs and compilers have improved, the benefits of dynamic systems often

allow enhanced flexibility (more threads) at lower costs

Page 19: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Dynamic Memory

InternalSRAM

CPU

ProgramCache

DataCache

EMIF

External Memory

Using Memory Efficiently

Stack

Heap

3. Local Variables If stack is located on-chip,

all functions can use it

4. Use the Heap Common memory reuse

within C language A Heap (ie. system memory)

can be allocated, then free’d

What if I need two heaps? Say, a big image array off-chip, and Fast scratch memory heap on-chip?

What if I need two heaps? Say, a big image array off-chip, and Fast scratch memory heap on-chip?

Technical TrainingOrganization

T TO

Page 20: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Multiple Heaps

InternalSRAM

CPU

ProgramCache

DataCache

EMIF

External Memory

Stack

Heap

Heap2

DSP/BIOS enables multiple heaps to be created

Page 21: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Multiple Heaps with DSP/BIOS DSP/BIOS enables multiple

heaps to be created

Check the box & set the size

when creating a MEM object

Page 22: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Multiple Heaps with DSP/BIOS DSP/BIOS enables multiple

heaps to be created

Check the box & set the size

when creating a MEM object

By default, the heap has the

same name as the MEM obj,

You can change it here

How can you allocate from multiple heaps?Technical Training

Organization

T TO

Page 23: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

MEM_alloc()

#define SIZE 32

x = MEM_alloc(IRAM, SIZE, ALIGN);

a = MEM_alloc(SDRAM, SIZE, ALIGN);

x = {…};

a = {…};

filter(…);

MEM_free(SDRAM,a,SIZE);

MEM_free(IRAM,x,SIZE);

Using MEM functions

#define SIZE 32

x=malloc(SIZE);

a=malloc(SIZE);

x={…};

a={…};

filter(…);

free(a);

free(x);

Standard C syntax

You can pick a specific heap

Technical TrainingOrganization

T TO

Page 24: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

BUF Concepts

Buffer pools contain a specified number of equal size buffers Any number of pools can be created Buffers are allocated from a pool and freed back when no longer needed Buffers can be shared between applications Buffer pool API are faster and smaller than malloc-type operations In addition, BUF_alloc and BUF_free are deterministic (unlike malloc) BUF API have no reentrancy or fragmentation issues

POOLBUF BUF BUF BUF BUF

SWI

BUF_alloc

BUF

TSK

BUF_free

BUF BUF BUF BUF

BUF_create BUF_delete

Technical TrainingOrganization

T TO

Page 25: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

GCONF Creation of Buffer PoolCreating a BUF1. right click on BUF mgr2. select “insert BUF”3. right click on new BUF4. select “rename”5. type BUF name6. right click on new BUF7. select “properties”8. indicate desired • Memory segment• Number of buffers• Size of buffers• Alignment of buffers• Gray boxes indicate

effective pool and buffer sizes

Technical TrainingOrganization

T TO

Page 26: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Using Memory Efficiently

1. Keep it on-chip

2. Use multiple sections

3. Use local variables

(stack)

4. Using dynamic memory

(heap, BUF)

5. Overlay memory

(load vs. run)

6. Use cache

Technical TrainingOrganization

T TO

Page 27: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Use Memory Overlays

InternalSRAM

CPU

ProgramCache

DataCache

EMIF

External Memory

algo2

algo1

Using Memory Efficiently

5. Use Memory Overlays Reuse the same memory

locations for multiple algorithms (and/or data)

You must copy the sections yourself

First, we need to make custom sections?Technical TrainingOrganization

T TO

Page 28: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Create Sections to Overlay

#pragma CODE_SECTION(fir, “.FIR”);int fir(short *a, …)

#pragma CODE_SECTION(iir, “myIIR”);int iir(short *a, …)

myCode.C

How can we get them to run from the same location?

Where will they be originally loaded into memory?

The key is in the linker command file …

Technical TrainingOrganization

T TO

Page 29: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Load vs. Run Addresses

SECTIONS{ .FIR:> IRAM /*load & run*/ myIIR: load=IRAM, run=IRAM

InternalSRAM

External Memory

.fir

myIIR

Simply directing a section into a MEM obj indicates it’s both the load & run from the same location

.FIR:> IRAM

Alternatively, you could use:

.FIR: load=IRAM, run=IRAM In your own linker cmd file:

load: where the fxn resides at reset

run: tells linker its runtime location

What if we wanted them be loaded to off-chip but run from on-chip memory?

Page 30: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Load vs. Run Addresses

Simply specify different addresses for load and run

You must make sure they get copied (using the memcopy or the DMA)

loadaddresses

runaddresses

load: where the fxn resides at reset run: tells linker its runtime location

SECTIONS{ .FIR: load=SDRAM,run=IRAM myIIR: load=SDRAM,run=IRAM

Internal

SRAMExternal Memory

.FIR

myIIR

Back to our original problem, what if we want them to

run from the same address?

Page 31: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

SECTIONS{ .FIR: load=SDRAM,run=IRAM myIIR: load=SDRAM,run=IRAM

Combining Run Addresses with UNION

Above, we only force different load/run

Below, we also force them to share (union) run locations

loadaddresses

runaddresses

SECTIONS{ UNION run = IRAM { .FIR : load = EPROM myIIR: load = EPROM }

Internal

SRAMExternal Memory

How can we make the overlay procedure easier?

Page 32: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

SECTIONS{ UNION run = IRAM { .FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) }}

Using Copy Tables

Page 33: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

SECTIONS{ UNION run = IRAM { .FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) }}

Using Copy Tables

typedef struct copy_record{ unsigned int load_addr;

unsigned int run_addr;unsigned int size;

} COPY_RECORD;

typedef struct copy_table{ unsigned short rec_size;

unsigned short num_recs;COPY_RECORD recs[2];

} COPY_TABLE;

fir_copy_table 31fir load addr

copy record fir run addrfir size

iir_copy_table 31iir load addr

copy record iir run addriir size

How do we use a Copy Table?Technical TrainingOrganization

T TO

Page 34: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

SECTIONS{ UNION run = IRAM { .FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) }}

Using Copy Tables

#include <cpy_tbl.h>extern far COPY_TABLE fir_copy_table;extern far COPY_TABLE iir_copy_table;extern void fir(void);extern void iir(void);

main(){ copy_in(&fir_copy_table); fir(); ...

copy_in(&iir_copy_table); iir(); ...}

copy_in() provides a simple wrapper around mem_copy().

Better yet, use the DMA hardware to copy the sections; specifically, the DAT_copy() function.

copy_in() provides a simple wrapper around mem_copy().

Better yet, use the DMA hardware to copy the sections; specifically, the DAT_copy() function.

What could be even easier than using Copy Tables?What could be even easier than using Copy Tables?

Page 35: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Use Cache

InternalCache

CPU

ProgramCache

DataCache

EMIF

External Memory

.bss

.text

Using Memory Efficiently

6. Use Cache Works for Code and Data Keeps local (temporary)

scratch copy of info on-chip Commonly used, since once

enabled it’s automatic Discussed further in

Chapter 14

Technical TrainingOrganization

T TO

Page 36: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Summary: Using Memory Efficiently

You may want to work through your memory allocations in the following order:

1. Keep it all on-chip

2. Use Cache

(more in Ch 15)

3. Use local variables

(stack on-chip)

4. Using dynamic memory

(heap, BUF)

5. Make your own sections

(pragma’s)

6. Overlay memory

(load vs. run)

While this tradeoff is highly application dependent, this is a good place to start

Technical TrainingOrganization

T TO

Page 37: TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

ti

Technical TrainingOrganization


Recommended