Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | nelson-perry |
View: | 214 times |
Download: | 0 times |
Memory Memory
10/9 - 2004
INF5060:Multimedia data communication using network processors
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Overview
Memory on the IXP cards Kinds of memory Its features Its accessibility
Microengine assembler Memory management
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Kinds of MemoryMicroengine general purpose registers
128 registers On chip
StrongARM instruction cache
16 Kbytes On chip
StrongARM data cache
8 Kbytes On chip
StrongARM mini cache
512 bytes On chip
Scratch(pad) 4 Kbytes On chip
Instruction store 64 Kbytes On chip
FlashROM 8 Mbytes
SRAM 8 Mbytes
SDRAM 256 Mbytes
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
IX Bus Unit
IXP Functional Units
Ethernet MAC (other IX devices)
IX Bus
StrongARMCore
IXP Network Processor
SRAM Unit
SDRAM Unit
PCI Bus Unit
Microengine
Variousbusses
PCI BusHost machine
PCI-to-PCI bridge
SDRAM(up to 256 MB)
SRAM(up to 8 MB)
Flash ROM(up to 8 MB)
Memory MappedI/O devices
64 bit/33Mhz
64 bit/116Mhz
32 bit/116Mhz
64 bit/104Mhz
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Kinds of Memory Physical memory on the IXP1200 is contiguous Memory in general is not byte-addressable
Memory units emulate byte addressing for the StrongARM
Big endian architecture StrongARM: big endian mode Microengines are big endian
Memory type Addressable data unit (bytes)
Relative access time (cycles)
Scratch(pad) 4 12-14
SRAM 4 16-20
SDRAM 8 32-40
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Terms Careful ! Inconsistencies !
Wording in Intel IXP manuals Word: 16 bit Longword: 32 bit Quadword: 64 bit
Wording in StrongARM and other ARM manuals Halfword: 16 bit Word: 32 bit
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Kinds of Memory Memory accessible to StrongARM
Mapped into a single address space
Memory accessible to microengines Individually mapped Separate assembler instructions for
each kind
Device 0SRAM Unit
Device 1PCI Unit
Device 2Reserved
Device 3StrongARM Core System
Device 4Reserved
Device 5AMBA Translation Unit
Device 6SDRAM Unit
0000 0000
4000 0000
8000 0000
9000 0000
A000 0000
B000 0000
C000 0000
FFFF FFFF
SDRAM
Scratchpad
Microengine registers
SRAM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Memory: memory, cache memory, registers
StrongARM core caches Microengine registers SDRAM SRAM IX Bus Unit: Scratch(pad) memory
StrongARM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
StrongARM Core Features A general purpose processor
With MMU 16 Kbytes instruction cache
Round robin replacement 8 Kbytes data cache
Round robin replacement Write-back cache, cache replacement on read, not on write
512 byte mini-cache for data that is used once and then discarded
To reduce flushing of the main data cache Instruction code stored in SDRAM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
IX Bus Unit
StrongARM Core Access
Full access to SDRAM Unit SRAM Unit
incl. FlashROM PCI Bus Unit
Access to microengine’s Program code Status registers Program counters
Access to IX bus unit’s Status registers Scratch memory
StrongARM Core
SRAM Unit
SDRAM Unit
PCI Bus Unit
Microengine
Microengines
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Microengine Features 4 hardware contexts
2K x 32 bit instruction control store Every instruction is 32 bits long No instruction cache
Instructions downloaded onto the microengine by the StrongARM Not loaded from RAM on demand
5-stage instruction pipeline Blocks for reference operations Deferred execution to reduce context switch penalty
256 registers 32 bit registers
Load and store architecture Must bring data into registers, work, write to destination Single cycle access in registers Use “reference command” to fetch into registers Yield/sleep during fetch execution
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
IX Bus Unit
Microengine Access
Full access to SDRAM Unit SRAM Unit IX Bus Unit
Access to StrongARM Interrupts Trigger status register reads
Access to PCI bus unit Initiate DMA with SDRAM
Access to other microengines None
Access to self Inter-thread signaling No access to own instruction
code
SRAM Unit
SDRAM Unit
PCI Bus Unit
StrongARM CoreMicroEngine
Microengine
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Microengine Registers
From: IXP1200 Family Hardware Reference Manual
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Microengine Registers 256 registers
128 general purpose registers Arranged in two banks A and B Instructions with 2 input
registers From different banks Otherwise assembler warning
128 transfer registers Transfer registers are not
general purpose registers Ports to their neighboring
functional unit 64 SDRAM transfer registers
Transfer to and from SDRAM 32 read / 32 write
64 SRAM transfer registers Transfer to and from
everything but SDRAM 32 read / 32 write
4 busses can be used in parallel By different threads
Loading transfer registers 64 bytes at once from one
functional unit to another 128 bytes at once from the IX
bus
SDRAM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
General features Recommended use
StrongARM instruction code Large data structures Packets during processing
64-bit addressed (8 byte aligned, quadword aligned) 256 Mbytes 928 Mbytes/s peak bandwidth
Higher bandwidth than SRAM Higher latency than SRAM
Access StrongARM Microengines StrongARM takes precedence PCI DMA on behalf of microengines Direct access to IX Bus Unit’s Transmit and Receive FIFO
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Special features Byte, word, longword access supported through a read-
modify-write access to quadwords Speed penalty
Direct path from SDRAM to IX Bus Transmit and Receive FIFOs Controlled by microengines Up to 64 bytes transferable without microengine involvement
Byte aligner between SDRAM and IX Bus For sending to the Transmit FIFO Shift bytewise when e.g. header length has changed Can only be used by microengines in the t_fifo_wr command
SRAM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
General features Recommended use
Lookup tables Free buffer lists Data buffer queue lists
32-bit addressed (4 byte aligned, word aligned) 8 Mbytes 464 Mbytes/s peak bandwidth
Lower bandwidth than SDRAM Lower latency than SDRAM
Access StrongARM Microengines StrongARM takes precedence
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Accessing SRAM StrongARM access
Byte, word and longword access Bit operations through SRAM Alias Address Space Bit, byte, word write supported through read-modify-write
Microengine access Bit and longword access only Up to 8 longwords with one command Bit write supported through read-modify-write Bit operations within instructions
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Special features Atomic push/pop operations
For maintaining lists 8 entry push/pop register list Microengines
Named commands StrongARM
Dedicated memory addresses Don’t cache these memory areas
Atomic bit test, set and clear For synchronized access Microengine
Use a write transfer register Specify bits to test, read, or write Reading the bit changes the write transfer register
StrongARM Special macros for read-modify-write operations Blocks until operation is completed Don’t cache this memory
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Special features 8 entry CAM (content addressable memory) for read locks
For synchronized access 8 concurrent locks on memory Protect from StrongARM and microengines Read, unlock and write_unlock
Microengines sram assembler command Waits until locks is released
StrongARM 3 separate 8 MByte mapped memory regions Failed locking is indicated by flags, read always successful Don’t cache these memory areas
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
StrongARM Core Memory Map
Device 1PCI Unit
Device 2Reserved
Device 3StrongARM Core System
Device 4Reserved
Device 5AMBA Translation Unit
Device 6SDRAM Unit
0000 0000
4000 0000
8000 0000
9000 0000
A000 0000
B000 0000
C000 0000
FFFF FFFF
Device 0SRAM Unit
Slow Port 3840 0000 – 385F FFF
Command FIFO Test 3800 0080 – 3800 00FF
SRAM CSRs 3800 0000 – 3800 0028
List 7 Pop operations 2780 0000 – 27FF FFFF
List 6 Pop operations 2700 0000 – 277F FFFF
List 5 Pop operations 2680 0000 – 26FF FFFF
List 4 Pop operations 2600 0000 – 267F FFFF
List 3 Pop operations 2580 0000 – 25FF FFFF
List 2 Pop operations 2500 0000 – 257F FFFF
List 1 Pop operations 2480 0000 – 24FF FFFF
List 0 Pop operations 2400 0000 – 247F FFFF
List 7 Push operations 2380 0000 – 23FF FFFF
List 6 Push operations 2300 0000 – 237F FFFF
List 5 Push operations 2280 0000 – 22FF FFFF
List 4 Push operations 2200 0000 – 227F FFFF
List 3 Push operations 2180 0000 – 21FF FFFF
List 2 Push operations 2100 0000 – 217F FFFF
List 1 Push operations 2080 0000 – 21FF FFFF
List 0 Push operations 2000 0000 – 207F FFFF
Bit Test & Set 1980 0000 – 19FF FFFF
Bit Test & Clear 1900 0000 – 197F FFFF
Bit Write Set 1880 0000 – 18FF FFFF
Bit Write Clear 1800 0000 – 187F FFFF
CAM Unlock 1600 0000 – 167F FFFF
Write Unlock 1400 0000 – 147F FFFF
Read Lock 1200 0000 – 127F FFFF
Read/Write 1000 0000 – 107F FFFF
BootROM 0000 0000 – 007F FFFF
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Memory Map for SRAM addresses
Physical Device
Function StrongARM Address Space(byte addressing)
Microengine SRAMinstruction command
MicroengineAddress Space(longword addressing)
SlowPort Slow Port 3840 0000 – 385F FFF read/write 70 0000 – 7F FFFF
SRAM CSRs SRAM CSRs 3800 0000 – 3800 0013
read/write 60 0000 – 60 0080
SRAM Pop operations
2400 0000 – 27FF FFFF
pop 00 0000 – 1F FFFF
SRAM Push operations
2000 0000 – 23FF FFFF
push 00 0000 – 1F FFFF
SRAM Bit Test & Set 1980 0000 – 19FF FFFF
bit_wr (test_and_set_bits)
00 0000 – 1F FFFF
SRAM Bit Test & Clear
1900 0000 – 197F FFFF
bit_wr (test_and_clear_bits)
00 0000 – 1F FFFF
SRAM Bit Write Set 1880 0000 – 18FF FFFF
bit_wr (set_bits) 00 0000 – 1F FFFF
SRAM Bit Write Clear
1800 0000 – 187F FFFF
bit_wr (clear_bits) 00 0000 – 1F FFFF
SRAM Unlock 1600 0000 – 167F FFFF
unlock 00 0000 – 1F FFFF
SRAM Write Unlock 1400 0000 – 147F FFFF
write_unlock 00 0000 – 1F FFFF
SRAM Read Lock 1200 0000 – 127F FFFF
read_lock 00 0000 – 1F FFFF
SRAM Read/Write 1000 0000 – 107F FFFF
read/write 00 0000 – 1F FFFF
BootROM BootROM 0000 0000 – 007F FFFF
read/write 20 0000 – 3F FFFF
IX Bus Unit
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
“FBI” Engine Interface
IX Bus Unit
SDRAM Unit Microengines
Ethernet MAC (other IX devices)
Transmit FIFOReceive FIFO Hash UnitsStatus
Registers
IX Bus
StrongARM
IXP Network Processor
IX Bus Unit
Scratchpad
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Scratch Memory: General Features Recommended use
Passing messages between processors and between threads
Semaphores, mailboxes, other IPC
32-bit addressed (4 byte aligned, word aligned) 4 Kbytes Has an atomic autoincrement instruction
Only usable by microengines
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
StrongARM Core Memory Map
Device 0SRAM Unit
Device 1PCI Unit
Device 2Reserved
Device 3StrongARM Core System
Device 4Reserved
Device 5AMBA Translation Unit
Device 6SDRAM Unit
0000 0000
4000 0000
8000 0000
9000 0000
A000 0000
B000 0000
C000 0000
FFFF FFFFScratchpad Memory
B004 4000 – B004 4FFF
IX Bus Unit CSR B004 0000
ME5 Transfer Regs
B000 6800
ME4 Transfer Regs
B000 6000
ME3 Transfer Regs
B000 5800
ME2 Transfer Regs
B000 5000
ME1 Transfer Regs
B000 4800
ME0 Transfer Regs
B000 4000
ME5 CSR B000 2800
ME4 CSR B000 2000
ME3 CSR B000 1800
ME2 CSR B000 1000
ME1 CSR B000 0800
ME0 CSR B000 0000
ME = microengine
Microengine Assembler
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Using Microengine Registers Programming
Context-relative addressing Each threads can have its own
window of registers (one 4th of the total), so they can’t overwrite each other
Absolute addressing Register is visible to all threads
Context-relative vs. absolute addressing
Decided on a per-instruction basis
Assembler Supports symbolic names Assigns registers from the different
kinds Programmer
must take care concerning the number of registers used
can hint the assembler to assign (transfer) registers contiguously
Context-relative addressing of the registers
Threads are only able to address their own register share
This is more typically used Assembler notations
symbolic_register_name – general purpose register
$symbolic_register_name – SRAM transfer register
$$symbolic_register_name – SDRAM transfer register
Absolute addressing Threads can use more than their
share of registers Threads can communicate via
registers Assembler notations
@symbolic_register_name – general purpose register
@$symbolic_register_name – SRAM transfer register
@$$symbolic_register_name – SDRAM transfer register
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Microengine Assembler ALU
alu[dest_reg, A_operand, alu_op, B_operand] Perform addition, subtraction, bit operations dest_reg
transfer register (TR), general purpose register (GPR) or nothing A_operand
TR, GPR, immediate data, or nothing B_operand
TR, GPR, or immediate data
ALU_SHF alu_shf[dest_reg, A_operand, alu_op, B_operand, B_op_shift_cnt] Like ALU, but shift B_operand before evaluation dest_reg
Context-relative TR, GPR, or nothing A_operand
TR, GPR, immediate data, or nothing B_operand
TR, GPR, or immediate data
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Microengine Assembler BR_BCLR, BR_BSET
br_bclr[reg, bit_position, label#] Branch if the given bit (0-32) in register reg is cleared
or set, respectively reg
Context-relative TR or GPR
BR=BYTE, BR!=BYTE Br=byte[reg, byte_spec, byte_compare_value, label#] Ranch if the indicated byte (0-3) of register reg is of
the constant value byte_compare_value, or not, respectively
reg Context-relative TR or GPR
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Microengine Acess to SDRAM Read, write, Receive FIFO read, Transmit FIFO write
sdram[sdram_cmd, $$sdram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token
Parameters sdram_cmd
read: read from SDRAM to TRs write: write from TRs to SDRAM r_fifo_rd: read from Receive FIFO to SDRAM t_fifo_wr: write to Transmit FIFO from SDRAM
$$sdram_xfer_reg The first of a set of contiguous TRs for read and write operations One ref_count requires to TRs
source_op_1/2 Specifies the address to read from or to write to
ref_count Values between 1 and 8 are valid
optional_token ctx_arb allows other threads to run until memory operation is complete ctx_swap switches context to the next thread The (complicated) indirect_ref option must be used r_fifo_rd and t_fifo_wr
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Microengine Access to SRAM (1/2) Read, write, read and lock, write and unlock, unlock, …
sram[sram_cmd, $sram_xfer_reg, source_op_1, source_op_2, ref_count] optional_token
sram_cmd Read or write
$sram_xfer_reg the first of ref_count contiguous TRs
source_op_1+source_op_2 Specifies the address to read from or to write to
ref_count The number of longwords read or written
sram[read_lock, $sram_xfer_reg, source_op_1, source_op_2, ref_count] optional_token
Like sram[read, …] But lock the address source_op_1+source_op_2
sram[write_unlock, $sram_xfer_reg, source_op_1, source_op_2, 1] optional_token
Write one TR to source_op_1+source_op_2 and unlock the address sram[unlock, --, source_op_1, source_op_2, 1] optional_token
Unlock the address specified by souce_op_1+source_op_2
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Microengine Access to SRAM (2/2) …, bit operations, push, pull
sram[bit_wr, $bit_mask, source_op_1, source_op_2, bit_op] optional_token As with scratch memory but with the larger address space $bit_mask is a write TR holds mask on input and optional results
sram[push, --, source_op_1, source_op_2, queue_num] optional_token Add source_op_1 and source_op_2 to get an address Push the address onto queue queue_num
sram[pop, $popped_list, --, --, queue_num] optional_token Pop an address from queue queue_num Store the pointer in the TR $popped_list
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Microengine Access to Scratch Memory
Read, write, bit operations, in-place increment scratch[bit_wr, $sram_xfer_reg, source_op_1, source_op_2, bit_op],
optional_token Bit operations
scratch[read, $sram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token
Read into transfer registers scratch[write, $sram_xfer_reg, source_op_1, source_op_2, ref_count],
optional_token Write from transfer registers
scratch[incr, --, source_op_1, source_op_2, 1], optional_token In-place increment by 1
Parameters source_op1/2
Context-relative transfer registers (TRs) or immediate values Sum between 0 and 1023
$sram_xfer_reg For read and write: the first of a set of contiguous TRs to be read or written For bit_wr: a TR containing a bit mask
ref_count Number of longwords read or written Between 1 and 8
bit_op set_bits, clear_bits, test_and_set_bits, test_and_clear_bits For the test_ operations, the write TR is modified
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Microengine Assembler Ordering problems
Exampleimmed[$$temp, 0x1234]
sdram[write,$$temp,base,0,1], ctx_swap, defer[1]
immed[$$temp,0x5678]
The wrong value may be written Writing and context swapping are deferred The register modification may overtake
Address of a register It is possible to determine the address of a register
.local a_gp_reg immed[a_gp_reg,&$an_sram_reg] .endlocal
Memory Management
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Resource Manager Task
Used by StrongARM code For microACEs and microACE applications to interface
with microengines
API Load code into microengines Enable/disable microengines Get/set microengine configuration and resource
assignment Send and receive packets to and from microcode blocks Allocate and access uncached SRAM, SDRAM and
Scratch memory
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Resource Manager Data structures
RmMemoryHandle Opaque handle identifying memory allocated by the resource
manager typedef int RmMemoryHandle
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Resource Manager RmMalloc
Allocate a particular kind of memory RM_SRAM RM_SDRAM RM_SCRATCH
Some SRAM and SDRAM is already used by the ASL, some SDRAM is used by Linux, the rest can be used freely by microACEs for data structures of its choosing
The memory is not cached The memory is not protected by an MMU, and the virtual address is the same for
all processes Returned pointers are always aligned (SDRAM to 8 bytes, SRAM and Scratch to 4
bytes) Requested sizes are rounded to alignment This allocation is not efficient
microACEs should allocate all memory they need at once and manage it themselves ix_error RmMalloc(
RmMemoryType in_memory_type, unsigned char* out_mem_handle_ptr, int in_size_in_bytes );
RmFree Released memory allocated by RmMalloc ix_error RmFree( unsigned char* ptr );
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Resource Manager Translating between virtual and physical addresses
The microengines map memory differently into their address space then the StrongARM
StrongARM addresses make no sense and have to be translated to offsets from the start of each particular kind of memory (and back)
RmGetPhysOffset ix_error RmGetPhysOffset(
RmMemoryType in_memory_type, unsigned char* in_data_ptr, unsigned int* out_offset );
Translate address in_data_ptr in RmAlloc’d memory to its offset from the given memory type
The offset is in words (4 byte units) for SRAM and Scratch, and in quadwords (8 byte units) for SDRAM
RmGetVirtualAddress ix_error RmGetVirtualAddress(
RmMemoryType in_memory_type, unsigned char** out_buffer_ptr, unsigned int in_offset);
Take the physical offset from the base of the given memory type and translate it into a virtual address valid for the StrongARM
2004 Carsten Griwodz & Pål HalvorsenINF5060 – multimedia communication using network processors
Summary
Memory on the IXP cards Kinds of memory Its features Its accessibility
Microengine assembler Resource Manager functionsStrong