Support Across The Board™
Blackfin Speedway PresentationCore, Memory, and Peripherals
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Blackfin as a Convergent Processor
Commonly asked questions:
What makes Blackfin a “convergent” processor?
What architectural features enable convergent processing?
What type of performance can Blackfin achieve from a networking standpoint?
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Agenda
• Blackfin “Convergent Processing”• Blackfin Core Details
– Registers– ALU, MAC, Shifter
– Sequencer, Pipeline, Event Controller • Blackfin Memory
– Memory Architecture– Cache
• Peripherals– General Peripherals (UART,SPORT, SPI, TWI, WD, RTC)– Ethernet, CAN– PPI– DMA
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
What architectural features enable convergent processing?
• Integrated instruction set architecture
– Single instruction set for signal processing and control
• Programmable interrupt levels
– Real-time tasks get the highest priority level
• Memory protection with an MMU
– Regions of memory can be protected from access
• Networked peripherals in addition high speed connectivity to ADC, DAC and video peripherals
• Unified address space and byte addressable
• Support for User and Supervisor modes
• Robust ALU including both signal processing functions as well as traditional MPC/MPU functions
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
What makes Blackfin a Convergent Processor?
• Blackfin has a mature compiler that produces highly optimized code (with an option to produce “dense code” for control applications)
• Blackfin processors come with a full suite of C-based device drivers for peripherals– Fully documented, common APIs
• Blackfin beats the competition in terms of DSP benchmarks and it is on par with ARM code density benchmarks
• Blackfin is scalable across a broad set of applications– ADSP-BF531 on the low end– Dual-core ADSP-BF561 on the high end
• Latest peripheral integration expands connectivity to network-based applications
• Large set of options for OS and kernel support, including uCLinux
Support Across The Board™
Blackfin ADSP-BF536/537 Architecture
Overview
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Blackfin Architecture Basics
CoreRegisters
ALU, MAC, Shifter
Data Addressing Modes
Program Sequencer
Event Controller
Peripherals
Instruction Set Overview
MemoryArchitecture
Cache
Support Across The Board™
Section 1
Register File
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Accessing Registers
• Blackfin processors are register-intensive devices
– All computations are performed on data contained in registers
– All peripherals are setup using registers
– Memory is accessed using pointers in address registers
• There are two types of Blackfin processor registers
– Core registers
– Memory-mapped registers (MMRs)
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Blackfin Core Registers
• Core registers are accessed directly by name– Data Registers: R0-R7
– Accumulator Registers: A0, A1
– Pointer Registers: P0-P5, FP, SP,USP
– DAG Registers: I0-I3, M0-M3, B0-B3, L0-L3
– Cycle Counters: CYCLES, CYCLES2
– Program Sequencer: SEQSTAT
– System Configuration Register: SYSCFG
– Loop Registers: LT[1:0], LB[1:0], LC[1:0]
– Interrupt Return Registers: RETI, RETX, RETN, RETE Example:
R0 = SYSCFG; // Load data register with contents of SYSCFG register
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Core Registers
LT0LB0
Loop CounterLoop TopLoop Bottom
ASTAT
RETS
RETI
RETX
RETN
RETE
Arithmetic Status
Subroutine Return
Interrupt Return
Exception Return
NMI Return
Emulation Return
LT1LB1
System Config
Sequencer Status
SYSCFG
SEQSTAT
LC0
LC1
I0
I1
I2
I3
L0
L1
L2
L3
B0
B1
B2
B3
M0
M1
M2
M3
31 0 31 0 31 0 31 0
P0
P1
P2
P3
P4
P5
31 0
FP
SP
USP
Address Registers
R0
R1
R2
R3
R4
R5
R6
R7
R0.LR0.H
R1.LR1.H
R4.LR4.H
R7.LR7.H
1531
A1.H A0.L
A0.H A0.L
A1X
A0X
Data Registers
1531
Shaded registers only accessible in Supervisor mode
39
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Memory-Mapped Registers (MMRs)
• A majority of registers are memory-mapped and must be accessed indirectly– Core MMRs are used to configure the core registers
• They are listed in Appendix A of the HRM• All Core MMRs must be accessed with 32-bit reads or writes
– System MMRs are used to configure all other peripherals• They are listed in Appendix B of the HRM• Some System MMRs must be accessed with 32-bit reads or writes and others
with 16-bit reads or writes (See the HRM for details)
• MMR addresses are defined in header files– defBF53x.h for assembly– cdefBF53x.h for C/C++
• MMRs can only be accessed in Supervisor mode
Assembly Example:P0.H = HI(SPI_RDBR); // load upper 16-bits of SPI Receive Register address to pointer registerP0.L = LO(SPI_RDBR); // load lower 16-bits of SPI Receive Register address to pointer registerR0 = W[P0] (z); // read 16-bit SPI Receive Register (SPI_RDBR) into data register
C/C++ Example:short temp; // define variable to store contentstemp = *pSPI_RDBR; // read 16-bit SPI Receive Register contents into data element
Support Across The Board™
Section 2
Arithmetic Logic Units (ALU)
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Arithmetic Logic Unit (ALU)
Data Arithmetic Unit
A1
40barrelshifter
A0
40
1616
8 8 8 8
LD0 32-bits
LD1 32-bits
SD 32-bits
R0
R1
R2
R3
R4
R5
R6
R7
R0.L
R1.L
R2.L
R3.L
R4.L
R5.L
R6.L
R7.L
R0.H
R1.H
R2.H
R3.H
R4.H
R5.H
R6.H
R7.H
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Arithmetic Logic Unit (ALU)
• Two 40-bit ALUs operate on 16-bit, 32-bit, and 40-bit input data and output 16-bit, 32-bit, and 40-bit results.
• Functions
– Fixed-point addition and subtraction
– Addition and subtraction of immediate values
– Accumulation and subtraction of multiplier results
– Logical AND, OR, NOT, XOR, bitwise XOR (LFSR), Negate
– Functions: ABS, MAX, MIN, Round, division primitives
– Supports conditional instructions
• Four 8-bit video ALUs
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
40-bit ALU Operations
• 40-bit ALU operations support the following combinations:
– Single 16-Bit Operations
– Dual 16-Bit Operations
– Quad 16-Bit Operations
– Single 32-Bit Operations
– Dual 32-Bit Operations
Support Across The Board™
Section 3
Multiply-Accumulators (MAC)
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Multiply-Accumulators (MAC)
Data Arithmetic Unit
A1
40barrel
shifter
A0
40
1616
8 8 8 8
LD0 32-bits
LD1 32-bits
SD 32-bits
R0
R1
R2
R3
R4
R5
R6
R7
R0.L
R1.L
R2.L
R3.L
R4.L
R5.L
R6.L
R7.L
R0.H
R1.H
R2.H
R3.H
R4.H
R5.H
R6.H
R7.H
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Multiply-Accumulators (MAC)
• Two identical MACs
– Each performs fixed-point multiplication and multiply-accumulate operations on 16-bit fixed-point input data and outputs 32-bit or 40-bit results.
• Functions
– Multiplication
– Multiply-accumulate with addition
– Multiply-accumulate with subtraction
– Dual versions of the above
• Features
– Saturation of accumulator results
– Optional rounding of multiplier results
Support Across The Board™
Section 4
Barrel-Shifter (Shifter)
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Barrel-Shifter (Shifter)
Data Arithmetic Unit
A1
40barrel
shifter
A0
40
1616
8 8 8 8
LD0 32-bits
LD1 32-bits
SD 32-bitsR0
R1
R2
R3
R4
R5
R6
R7
R0.L
R1.L
R2.L
R3.L
R4.L
R5.L
R6.L
R7.L
R0.H
R1.H
R2.H
R3.H
R4.H
R5.H
R6.H
R7.H
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Barrel-Shifter (Shifter)
• Performs bitwise shifting for 16-bit, 32-bit or 40-bit inputs and yields 16-bit, 32-bit, or 40-bit outputs.
• Shift Functions
– Arithmetic Shifts preserve the sign of the original number. The sign bit value back-fills the left-most bit positions vacated by the arithmetic right shift.
– Logical Shifts discard any bits shifted out of the register and back-fills vacated bits with zeros.
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Barrel-Shifter (Shifter)
• Additional Functions
– Rotate: Rotates a registered number through the CC bit a specified distance and direction.
– Bit Operations – Set, Clear, Toggle, Test
– Field Extract and Deposit
Support Across The Board™
Section 5
Data Addressing Modes
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Address Registers
I0
I1
I2
I3
L0
L1
L2
L3
B0
B1
B2
B3
M0
M1
M2
M3
31 0 31 0 31 0 31
P0
P1
P2
P3
P4
P5
31 0
FP
SP
USP
Address Registers
One set of 32-bit general-purpose Pointer registers P0-P5, SP and FP
One set of 32-bit DSP buffer addressing registers I0-I3, B0-B3, L0-L3, M0-M3
All addresses are byte addresses into a 4 GB address space
SP points to supervisor stack in Supervisor mode and user stack in User mode
USP is accessible in supervisor mode only – Allows access to user stack location while in Supervisor mode
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Addressing Methods
• Register Indirect Addressing
– Index Registers (32-bit and 16-bit accesses)
– Pointer Registers P0 – P5 (32-bit, 16-bit, and 8-bit accesses)
– Stack and Frame Pointer Registers (32-bit accesses)
• Types of address pointer modify
– Modify/Post-Modify
• Linear addressing
• Circular buffering / modulo addressing
– Enables automatic maintenance of pointers to stay within bounds of a circular buffer
• Bit-Reversal (Modify only)
– Pre-Modify with update (using Stack Pointer)
– Pre-Modify without update
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Linear vs Circular Buffering
• Linear Buffer Access– Index (I0:3) registers hold the address sent out on the address
bus.– Length (L0:3) register set to 0, thus disabling circular buffering.
• Default for C compiler• Provisions in compiler to allow circular buffers
– Modify (M0:3) registers contain the value (positive or negative) that is added to the I registers at the end of each memory access.
• Circular Buffer Access– Base (B0:3) registers contain the circular buffer’s start address.– Length (L0:3) register set to length of circular buffer.– Modify (M0:3) value must be less than or equal to the length of the
circular buffer.– Indexing wraps back to Base address when Index modification
exceeds Base + Length
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Circular Buffer Example
0x00000001
0x00000002
0x00000003
0x00000004
0x00000005
0x0000000B
0x00000006
0x00000007
0x00000008
0x00000009
0x0000000A
0x00000001
0x00000002
0x00000003
0x00000004
0x00000005
0x0000000B
0x00000006
0x00000007
0x00000008
0x00000009
0x0000000A
Address
0
4
8
C
10
14
18
1C
20
24
28
Base Address and Starting Index Address (B0 = 0; I0 = 0;) Buffer Length is 44 (L0 = 44;)
There are 11 data elements and each data element is 4 bytes Modify Value is 16 (M0 = 16;)
4 elements * 4 bytes/element
1st Access
2nd Access
5th Access
4th Access
3rd Access
Support Across The Board™
Section 6
Program Sequencer
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
• Controls all program flow
• Contains a 10-stage instruction pipeline
• Maintains in-program branching
– Subroutines
– Jumps
– Interrupts and Exceptions
• Maintains loops
– Includes zero-overhead loop registers
– No cost for wrapping from loop bottom to loop top
Program Sequencer Features
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Blackfin Execution Pipeline
• 10-stage super-pipeline
• Sequencer ensures that the pipeline is fully interlocked and that all the data hazards are hidden from the programmer
• If executing an instruction that requires data to be fetched, the pipeline will stall until that data is available– See EE-197 application note for a complete list of stalls and multi-cycle
instructions: http://www.analog.com/ee-notes
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Avoiding Pipeline Stalls
Most common numeric operations have no instruction latency
VisualDSP++ Pipeline Viewer highlights Stall and Kill conditions
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Sequencer-Related Registers
Support Across The Board™
Section 10Section 7
Event Controller
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Events (Interrupts / Exceptions)
• The Event Controller manages 5 types of Events
– Emulation (via external pin)
– Reset (via SW or external pin)
– Non-Maskable Interrupt (NMI) - for events that require immediate processor attention (via SW, external pin, or Watchdog)
– Exception
– Interrupts• Hardware Error• Core Timer• 9 General-Purpose Interrupts for servicing peripherals
– Can be custom prioritized for optimal system performance
• All events can be serviced by Interrupt Service Routines (ISR)
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Interrupts vs. Exceptions
INTERRUPTS• Hardware-generated
– Asynchronous to program flow
– Requested by a peripheral• Software-generated
– Synchronous to program flow– Generated by RAISE
instruction• All instructions preceding the
interrupt in the pipeline are killed
EXCEPTIONS• Service Exception
– Return address (RETE) is the address following the excepting instruction
– Never re-executed– EXCPT instruction is in this
category• Error Condition Exception
– Return address (RETE) is the address of the excepting instruction
– Excepting instruction will be re-executed
The Blackfin is always in Supervisor Mode while executing Event Handler software and can be in User Mode only while executing application tasks.
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
BF533 System and Core Interrupt Controllers
Emulator 0 EMU
Reset 1 RST
Non Maskable Interrupt 2 NMI
Exceptions 3 EVSW
Reserved 4 -
Hardware Error 5 IVHW
Core Timer 6 IVTMR
General Purpose 7 7 IVG7
General Purpose 8 8 IVG8
General Purpose 9 9 IVG9
General Purpose 10 10 IVG10
General Purpose 11 11 IVG11
General Purpose 12 12 IVG12
General Purpose 13 13 IVG13
General Purpose 14 14 IVG14
General Purpose 15 15 IVG15
PLL Wakeup interrupt IVG7
DMA error (generic) IVG7
PPI error interrupt IVG7
SPORT0 error interrupt IVG7
SPORT1 error interrupt IVG7
SPI error interrupt IVG7
UART error interrupt IVG7
RTC interrupt IVG8
DMA 0 interrupt (PPI) IVG8
DMA 1 interrupt (SPORT0 RX) IVG9
DMA 2 interrupt (SPORT0 TX) IVG9
DMA 3 interrupt (SPORT1 RX) IVG9
DMA 4 interrupt (SPORT1 TX) IVG9
DMA 5 interrupt (SPI) IVG10
DMA 6 interrupt (UART RX) IVG10
DMA 7 interrupt (UART TX) IVG10
Timer0 interrupt IVG11
Timer1 interrupt IVG11
Timer2 interrupt IVG11
PF interrupt A IVG12
PF interrupt B IVG12
DMA 8/9 interrupt (MemDMA0) IVG13
DMA 10/11 interrupt (MemDMA1) IVG13
Watchdog Timer Interrupt IVG13
Event Source IVG # Core Event Name
System Interrupt Source IVG # 1
1 Note: Default IVG configuration shown.
Highest
Lowest
P r
i o
r i t
y
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Event Processing Flow
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Interrupt Service Routine (ISR)
• ISR address is stored in the Event Vector Table– Used as the next fetch address when the event occurs
• Program Counter (PC) address is saved to a register– RETI, RETX, RETN, RETE, based on event
• Always concludes with “Return” Instruction– RTI, RTX, RTN, RTE (respectively)– When executed, PC is loaded with address stored in
RETI, RETX, RETN, or RETE to continue app code • Optional nesting of higher-priority interrupts possible
– See appnote EE-192, which covers writing interrupt routines in C (http://www.analog.com/ee-notes)
Support Across The Board™
Section 8
Blackfin Peripherals
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Peripherals and Power Management
Common Peripherals (All Blackfins)• SPI, UART, SPORT, WD, RTC• PPI
BF534/BF536/BF537 Peripherals• TWI, CAN
BF536/BF537 Peripheral• Ethernet
DMA and Handshake DMA
Power Manager
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Three Serial Communication Peripherals• SPI (Serial Peripheral Interface)
– High-Speed SPI port (up to SCLK/4, max 33.25 MHz)• Master/Slave compatible with control of up to 7 slave-selects• Single-Duplex DMA (Either TX or RX)
– Typically used to interface with serial EPROMS, CPUs, converters, and displays• UART (Universal Asynchronous Receiver/Transmitter)
– PC-style UART port (baud rate up to SCLK/16, max 8.3125 MHz)• Supports half-duplex IrDA SIR (9.6/115.2 Kbps rate)• Autobaud detection support through the use of the Timers• Separate TX and RX DMA support
– Typically used for maintenance port or interfacing with slow serial peripherals• SPORTs (Synchronous Serial Ports)
– High Speed Serial Port (up to SCLK/2, max 66.5 MHz)• Variable word length support (3 - 32 bits)• I2S-Compatible• Separate TX and RX DMA support• 128 Channels out of 1024-Channel Window for TDM support• Primary and Secondary Data channels
– Typically used for interfacing with CODECs and TDM data streams
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Real-Time Clock Features
• Used to implement real-time watch or “life counter”– Time of day, alarm, stopwatch count-down, and elapsed
time since last system reset• Uses four counters - Seconds, Minutes, Hours, Days• Equipped with two alarm features
– Daily and Day-And-Time• Uses dedicated 32.768 kHz crystal to RTXI / RTXO
– Can be pre-scaled to 1 Hz to count in real-time seconds• Uses dedicated power supply pins
– Independent of any reset• Can take processor out of all low-power states
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
PPI – What is it?
• Parallel Peripheral Interface
– Programmable bus width (from 8 – 16 bits in 1-bit steps)
– Bidirectional (half-duplex) parallel interface
– Synchronous Interface
• Interface is driven by an external clock (“PPI_CLK”)
• Up to 66MHz rate (SCLK/2)
• Asynchronous to SCLK
– Includes three frame syncs to control the interface timing
– Applications
• Driving LCD Interface
• General Purpose Interface to outside world
• High speed data converters
• Video CODECs
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
TWO-WIRE INTERFACE (TWI)
• Fully compliant to the Philips I2C bus protocol– See Philips I2C Bus Specification version 2.1
• 7-bit addressing• 100 Kb/s (normal mode) and 400Kb/s (fast mode) data rates• General call address support
• Supports Master and Slave operation– Separate receive and transmit FIFOs
• SCCB (Serial Camera Control Bus) support– Only in Master mode
• Slave mode cannot be used because the TWI controller always issues an Acknowledge in slave mode
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Controller Area Network (CAN)
• Adheres fully to CAN V2.0B standard– Supports both standard (11-bit) and extended (29-bit) Identifiers– Data Rates up to 1Mbit/second
• 32 Configurable Mailboxes– 8 dedicated transmitters and 8 dedicated receivers– 16 configurable (transmit or receive)
• Dedicated Acceptance Mask for each Mailbox
• Data Filtering (first two bytes) can be used for Acceptance Filtering
• CAN wakeup from Hibernation (lowest static power consumption) Mode
• CAN Protocol Stacks– Automotive: CAN drivers and protocol stacks through Vector CANtech – Industrial: Leading third parties will provide a full Industrial suite for
CANOpen, DeviceNet, etc.
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
ADSP-BF536/537 Family Ethernet MAC Features
ADSP-BF536/537 Ethernet MAC has advanced features beyond IEEE 802.3: For improved performance:
Automatic Checksum Computation for IP Header and Payload on RX Frames Programmable RX Data Alignment Mode for 32-bit Alignment Independent RX & TX DMA Channels with Delivery of Frame Status to Memory System Wakeup on Magic Packet for 4 User-Definable Wakeup Frame Filters
For lower overall system cost: No PHY XTAL required – Buffered XTAL output from processor feeds PHY Connection to either MII or RMII PHY
ADSP-BF536/537 enhances throughput and dataflow via these features: Enhanced DMA channels allow for processor core independence Direction Control to exploit SDRAM physics Four SDRAM rows can be ‘open’ at any given time
ADSP-BF536/537 overall networking bandwidth:Full 100Mbps wire speed on 1400-bit payload with an optimized networking stack
UDP : ~44% processor core loadingTCP/IP: ~75% processor core loading
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
ADSP-BF536/537 DMA Enhancements
• 4 additional DMA channels– All 12 peripheral DMA channels can be assigned to any
of the peripherals
• Provides MAC further control over the assigned DMA channels– Can reload DMA registers if incorrect checksum is detected
• Two External Handshaking Memory DMA Controllers– Good for asynchronous FIFOs or off-chip interface controllers
between Blackfin memory and hardware buffers
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Variable Frequency
Clock dividers (1x to 63x) enable low latency changes in system performance
Variable Voltage
On-Chip Voltage Regulator generates accurate voltage from 2.25 – 3.6V input
Core voltage programmable from 0.8V to 1.2V (50 mV increments)
Maximum 40usec latency for PLL to relock (Frequency or Voltage changes)
System Cost Reduction
Po
we
r (m
W)
600 MHz, 1.2V, 264 mW
200 MHz, 1.2V, 156 mW
500 MHz, 1.2V
500 MHz, 1.0V
Frequency Only
Voltage & Frequency
Power Savings
Audio ProcessingVideo Processing
Blackfin – Dynamic Power Management Increases Battery Life
200 MHz, 0.8V, 90 mW
Support Across The Board™
Section 9
Instruction Set Overview
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Instruction Set Description
• Full-featured flexible multifunction instructions
• Employs an algebraic-style syntax
• Optimized to allow access to many of the processor core resources within a single instruction
• Compiled C and C++ source code makes optimal use of instructions
• Format designed for ease of coding and readability
• Tuned to generate dense code (small memory size footprint)
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Blackfin Assembly Language Features
• Multi-issue load/store modified-Harvard architecture supports – Two 16-bit MAC or four 8-bit ALU + two load/store + two
pointer updates per cycle.• Unified 4G byte memory space
– All registers, I/O, and memory are mapped a unified 4G byte memory space
– Providing a simplified programming model• Microcontroller features:
– Arbitrary bit and bit-field manipulation, insertion, and extraction– Integer operations on 8-, 16-, and 32-bit data-types– Separate user and supervisor stack pointers
• Code density enhancements– Intermixing of 16- and 32-bit instructions (no mode switching,
no code segregation)– Frequently used instructions are encoded in 16 bits.
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Blackfin DSPs Code Density
Instruction Set Tuned for Compact Code Multi-length Instructions
• 16, 32-bit Opcodes• Limited Multi-Issue
Compact Call/Return
No Memory Alignment Restrictions for Code Transparent Alignment HW Blackfin Supports 16 and 32-
bit Memory Systems
16-bit OP32-bit OP
16-bit widememory
015
64-bit Multi-OP Packet
031
32-bit widememory
No Memory Alignment Restrictions: Maximum Code Density and Minimum
System Memory Cost
Instruction Formats
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Blackfin Code Density Features
Free intermixing of 16/32-bit instructions - no mode switching, no code segregation
Frequently used instructions encoded as 16-bits
3-bit register fields
Conditional moves
Push/Pop multiple registers
Three operand instructions
Single condition bit and evaluation
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Data MovementLD, ST, 8,16,32 bitsUnsigned, Sign-extendRegister moves, P-D-DAG,Push, Pop, Push/PopmultCC to dreg, etc.
Addressing ModesAuto incr, Auto decr,Pre-decr store on SP,IndirectIndexed w/immed offsetPost-incr w/ nonunity strideByte addressable
Program ControlBRCC, UJUMP,Call, RETS, Loop Setup
Arithmetic+,-,*,/,>>>, Negate2 and 3 operand instructs
LogicalAND, OR, XOR, NOTBITtst,set,tgl,clr, CC ops<<,>>
VideoSAA, Byteops: Residual calc,Spatial Interpolation, SpatialFilter
Cache ControlPrefetch, Flush
A DSP with a RISC instruction set and a MMU, an event controller and a wide range of peripherals
Supervisor/user modes
Memory management
Wide range of peripherals
Event control
Blackfin Dual Operational Model
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Blackfin MicroController Features
Arbitrary bit and bit-field manipulation, insertion and extraction
Integer operations on 8/16/32-byte data-types
Memory protection and separate user and supervisor stack pointers
Scratch SRAM for context switching
Population and leading digit counting
Byte addressing DAGs
Compact Code Density
Support Across The Board™
Section 10
Blackfin Memory
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
ADSP-BF536/7 at a Glance
BlackfinProcessor
L1Instruction
L1Data A
L1Data B
64 bit
25MHzXTAL
EnetPHY
25MHz Enet Data SDRAM
Rows are “open” in 4 SDRAM banks
reducespage activation
ExtBus
W/directionControl
No need for second XTAL
PLL VCO
4 sub-banks allow 2 core accesses at
same time as DMA access
1:64X131MHz
DMA
2 core fetches
or 1 fetch and 1 store
16
Max Bandwidth 266MB/sec
32
Makes best use
of SDRAM
525 MHz
Large enough to run application code
Cache available if operations from SDRAM
are desired
Programmable frequency and voltage control
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Memory Hierarchy on the Blackfin
• As processor speeds increase (300Mhz – 1 GHz), it becomes increasingly difficult to have large memories running at full speed.
• The BF5xx uses a memory hierarchy with a primary goal of achieving memory performance similar to that of the fastest memory (i.e. L1) with an overall cost close to that of the least expensive memory (i.e. L2)
L2 Memory
External Larger capacityHigher latency
L1 Memory
InternalSmallest capacity
Single cycle access
CORE
(Registers)L3 Memory
External Largest capacityHighest latency
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Memory Architecture: The Basics
Core
L1 Instruction Memory
L1 Data Memory
External Memory
L1 Data Memory
External MemoryExternal MemoryUnified L3External Memory
Unified L2
Single cycle toaccess
10s of Kbytes
Several cycles to access 100s of Kbytes
Several system cycles to access
100s of Mbytes
>600MHz
>600MHz
>300MHz
<133MHz
On-chip
Off-chip
DMA
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Configurable Memory
• Best system performance can be achieved when executing code or fetching data out of L1 memory
• Two methods can be used to fill L1 memory – Caching and Dynamic Downloading – Blackfin Processor supports both– General Purpose processors have typically used the
caching method, as they often have large programs residing in external memory and determinism is not as important.
– DSPs have typically used dynamic downloading, as they need direct control over which code runs in the fastest memory.
• Blackfin processors allow the programmer to choose one or both methods to optimize system performance.
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
What is Cache?
• In a hierarchical memory system, cache is the first level of memory reached once the address leaves the core (i.e L1)– If the instruction/data word (8, 16, 32, or 64 bits) that
corresponds to the address is in the cache, there is a cache hit and the word is forwarded to the core from the cache.
– If the word that corresponds to the address is not in the cache, there is a cache miss. This causes a fetch of a fixed size block (which contains the requested word) from the main memory.
• The Blackfin allows the user to specify which regions (i.e. pages) of main memory are cacheable and which are not through the use of CPLBs (more on this later).
– If a page is cacheable, the block (i.e. cache line containing 32 bytes) is stored in the cache after the requested word is forwarded to the core
– If a page is non-cacheable, the requested word is simply forwarded to the core
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
Cache Hits and Misses
• A cache hit occurs when the address for an instruction fetch request from the core matches a valid entry in the cache.
• A cache hit is determined by comparing the upper 18 bits, and bits 11 and 10 of the instruction fetch address to the address tags of valid lines currently stored in a cache set.
• Only valid cache lines (i.e. cache lines with their valid bits set) are included in the address tag compare operation.
• When a cache hit occurs, the target 64-bit instruction word is sent to the instruction alignment unit where it is stored in one of two 64-bit instruction buffers.
• When a cache miss occurs, the instruction memory unit generates a cache line-fill access to retrieve the missing cache line from external memory to the core.
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
L1 Instruction Memory 16KB Configurable Bank
Instruction
DCB- DMA
4KBsub-bank
EAB – Cache Line Fill
4KBsub-bank
4KBsub-bank
4KBsub-bank
16 KB cache
• 4-way set associative with arbitrary locking of ways and lines
• LRU replacement
• No DMA access
16 KB SRAM
• Four 4KB single-ported sub-banks
• Allows simultaneous core and DMA accesses to different banks
Copyright © Avnet, Inc., Analog Devices, Inc. All rights reserved.
L1 Data Memory 16KB Configurable Bank
Block is Multi-ported when:Accessing different sub-bank
OR
Accessing one odd and one even access (Addr bit 2 different) within the same sub-bank.
Data 1
Data 0
4KBsub-bank
4KBsub-bank
4KBsub-bank
4KBsub-bank
• When Used as Cache– Each bank is 2-way
set-associative– No DMA access– Allows simultaneous
dual DAG access
• When Used as SRAM– Allows simultaneous
dual DAG and DMA access
DCB- DMA
EAB – Cache Line Fill