Post on 13-Nov-2014
transcript
1TM 139v10 The ARM Architecture
Day 10 Agenda
Exceptions
System Design
Memory Interface
Synchronization
Input / Output
2TM 239v10 The ARM Architecture
Vector Table
Exception Handling
When an exception occurs, the ARM: Copies CPSR into SPSR_<mode> Sets appropriate CPSR bits
Change to ARM state Change to exception mode Disable interrupts (if appropriate)
Stores the return address in LR_<mode> Sets PC to vector address
To return, exception handler needs to: Restore CPSR from SPSR_<mode> Restore PC from LR_<mode>
This can only be done in ARM state.Vector table can be at
0xFFFF0000 on ARM720T and on ARM9/10 family
devices
FIQ
IRQ
(Reserved)
Data Abort
Prefetch Abort
Software Interrupt
Undefined Instruction
Reset
0x1C
0x18
0x14
0x10
0x0C
0x08
0x04
0x00
3TM 339v10 The ARM Architecture
PSR Mode Bit Values
4TM 439v10 The ARM Architecture
Normal and High Vector Address
5TM 539v10 The ARM Architecture
Reset
When the nRESET signal goes LOW, the core abandons executing instruction and Forces the PC to fetch the next instruction from address
0x00.
When nRESET goes HIGH again, then Core Overwrites R14_svc and SPSR_svc by copying the current
values of the PC and CPSR into them. The value of the saved PC and SPSR is not defined.
Forces M[4:0] to 10011 (Supervisor mode), sets the I and F bits in the CPSR, and clears the CPSR's T bit.
Execution resumes in ARM state.
6TM 639v10 The ARM Architecture
Undefined Exception
When the core comes across an instruction which it cannot handle, it takes the undefined instruction trap. This mechanism may be used to extend either the THUMB or
ARM instruction set by software emulation. R14_udf = Address of next instruction address after the
undefined instruction SPSR_udf = CPSR CPSR[4:0] = 0b11011 (Mode bits forced to undef state) CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) Forces the PC to fetch the next instruction from address
0x04 or 0xFFFF0004
After emulating the failed instruction, the trap handler should execute the following irrespective of the state (ARM or Thumb) CPSR = SPSR_udf MOVS PC,R14_und (This restores the CPSR and returns to the
instruction following the undefined instruction)
7TM 739v10 The ARM Architecture
Software Interrupts
The software interrupt instruction (SWI) is used for entering Supervisor mode, usually to request a particular supervisor function. R14_svc = Address of next instruction after the SWI instruction SPSR_svc = CPSR CPSR[4:0] = 0b10011 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) Forces the PC to fetch the next instruction from address 0x08 or
0xFFFF0008
Upon Exiting SWI CPSR = SPSR_svc MOVS PC,R14_svc (This restores the PC and CPSR, and returns to
the instruction following the SWI)
2831 2427 0
Cond 1 1 1 1 SWI number (ignored by processor)
23
Condition Field
8TM 839v10 The ARM Architecture
Pre-fetch Abort Instruction
If a pre-fetch abort occurs, the pre-fetched instruction is marked as invalid, but the exception will not be taken until the instruction reaches the head of the pipeline. If the instruction is not executed - for example because a branch occurs while it is in the pipeline - the abort does not take place. R14_abt = Address of aborted instruction + 4 SPSR_abt = CPSR CPSR[4:0] = 0b10111 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) Forces the PC to fetch the next instruction from address 0x0C
or 0xFFFF000C
Upon Exiting Pre-Fetch Abort CPSR = SPSR_abt SUBS PC,R14, #4 (This restores the PC and CPSR, and returns
to the instruction following the Pre-Fetch abort)
9TM 939v10 The ARM Architecture
Data Abort
If a data abort occurs, the action taken depends on the instruction type: Single data transfer instructions (LDR, STR) write back modified
base registers: the Abort handler must be aware of this. The swap instruction (SWP) is aborted as though it had not been
executed. Block data transfer instructions (LDM, STM) complete. If write-back is set, the base is updated. If the instruction would have overwritten the base with data (ie it
has the base in the transfer list), the overwriting is prevented. All register overwriting is prevented after an abort is indicated,
which means in particular that R15 (always the last register to be transferred) is preserved in an aborted LDM instruction.
The abort mechanism allows the implementation of a demand paged virtual memory system. In such a system the processor is allowed to generate arbitrary addresses. When the data at an address is unavailable, the Memory Management Unit (MMU) signals an abort.
10TM 1039v10 The ARM Architecture
Data Abort
The abort handler must then work out the cause of the abort, make the requested data available, and retry the aborted instruction. The application program needs no knowledge of the amount of memory available to it, nor is its state in any way affected by the abort
Entering Data Abort R14_abt = Address of aborted instruction + 8 SPSR_abt = CPSR CPSR[4:0] = 0b10111 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) Forces the PC to fetch the next instruction from address 0x10 or
0xFFFF0010
Upon Exiting Data Abort CPSR = SPSR_abt SUBS PC,R14, #8 (This restores the PC and CPSR, and re-executes the
aborted instruction) SUBS PC,R14, #4 (This restores the PC and CPSR, and returns to the
instruction following the data abort instruction)
11TM 1139v10 The ARM Architecture
Interrupt Request (IRQ) Exception
The IRQ (Interrupt Request) exception is a normal interrupt caused by a LOW level on the nIRQ input. IRQ has a lower priority than FIQ and is masked out when a FIQ sequence is entered. It may be disabled at any time by
setting the I bit in the CPSR, though this can only be done from a privileged (non-User) mode.
Entering IRQ R14_irq = Address of next instruction + 4 SPSR_irq = CPSR CPSR[4:0] = 0b10010 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) Forces the PC to fetch the next instruction from address 0x18 or
0xFFFF0018 Exiting IRQ
CPSR = SPSR_irq SUBS PC,R14_irq, #4 (This restores the PC and CPSR, and returns to
the instruction)
12TM 1239v10 The ARM Architecture
Fast Interrupt Request (FIQ) Exception
The FIQ (Fast Interrupt Request) exception is designed to support a data transferor channel process, and in ARM state has sufficient private registers to removethe need for register saving (thus minimizing the overhead of context switching).
FIQ is externally generated by taking the nFIQ input LOW. This input can accept either synchronous or asynchronous transitions, depending on the state of the ISYNC input signal. When ISYNC is LOW, nFIQ and nIRQ are considered asynchronous, and a cycle delay for synchronization is incurred before the interrupt can affect the processor flow.
Entering FIQ R14_fiq = Address of next instruction + 4 SPSR_fiq = CPSR CPSR[4:0] = 0b10001 CPSR[T,FIQ,IRQ] = 0b011 (ARM State, and Disable FIQ’s & IRQs) Forces the PC to fetch the next instruction from address 0x1C or 0xFFFF001C
Exiting FIQ CPSR = SPSR_fiq SUBS PC,R14_fiq, #4 (This restores the PC and CPSR, and returns to the
instruction)
13TM 1339v10 The ARM Architecture
Return Address Calculation
Return Instruction Previous State Cycles
ARM R14_x THUMB R14_x
BL MOV PC, R14 PC + 4 PC + 2 1
SWI MOVS PC, R14_svc PC + 4 PC + 2 1
UDEF MOVS PC, R14_und PC + 4 PC + 2 1
FIQ SUBS PC, R14_fiq, #4 PC + 4 PC + 4 2
IRQ SUBS PC, R14_irq, #4 PC + 4 PC + 4 2
PABT SUBS PC, R14_abt, #4 PC + 4 PC + 4 1
DABT SUBS PC, R14_abt, #8 PC + 8 PC + 8 3
RESET NA – – 4
14TM 1439v10 The ARM Architecture
Exception Priorities
Highest priority:
1. Reset
2. Data abort
3. FIQ
4. IRQ
5. Pre-fetch abort
Lowest priority:
6. Undefined Instruction and Software interrupt.
15TM 1539v10 The ARM Architecture
Agenda
Exceptions
System Design
Memory Interface
Synchronization
Input / Output
16TM 1639v10 The ARM Architecture
Example ARM-based System
16 bit RAM
8 bit ROM
32 bit RAM
ARMCore
I/OPeripherals
InterruptController
nFIQnIRQ
17TM 1739v10 The ARM Architecture
AMBA
AMBA Advanced Microcontroller Bus Architecture Open specification framework for System-on-Chip (SoC) Designs
Bri
dg
e
Timer
On-chipRAM
ARM
InterruptController
Remap/Pause
TIC
Arbiter
Bus InterfaceExternalROM
ExternalRAM
Reset
System Bus Peripheral Bus
AHB or ASB APB
ExternalBus
Interface
Decoder
18TM 1839v10 The ARM Architecture
AMBA
AHB The widely adopted AHB System Bus connects embedded processors
such as an ARM core to high-performance peripherals, DMA controllers, on-chip memory and interfaces.
APB The AMBA APB (Advanced Peripheral Bus) is a simpler bus protocol
designed for ancillary or general purpose peripherals
ADK The AMBA Design Kit is a library of components which enables system
developers to build AMBA based systems quickly and accurately.
ACT The AMBA Compliance Testbench, a comprehensive environment which
enables the rapid development of tests to certify the IP as AMBA compliant.
PrimeCell ARM’s AMBA compliant peripherals
19TM 1939v10 The ARM Architecture
Agenda
Exceptions
System Design
Memory Interface
Synchronization
Input / Output
20TM 2039v10 The ARM Architecture
Memory Interface
Memory Hierarchy
Memory Size and Speed
ARM MMU
Memory Interfacing
21TM 2139v10 The ARM Architecture
Memory Memories come in many shapes, sizes and types
Shapes means packages like TQFP, TSOP, DIP Surface Mount Size: Like 4Mx8-Bit, 16Kx1 bit)
22TM 2239v10 The ARM Architecture
Memory Technologies
DRAM: Dynamic Random Access Memory upside: very dense (1 transistor per bit) and inexpensive downside: requires refresh and often not the fastest access times often used for main memories
SRAM: Static Random Access Memory upside: fast and no refresh required downside: not so dense and not so cheap often used for caches
ROM: Read Only Memory often used for bootstrapping and such
B
A A
B
Word line
Pass transistor
Capacitor
Bit line
23TM 2339v10 The ARM Architecture
Users want large and fast memories!
SRAM access times are 2 - 25ns at cost of $100 to $250 per Mbyte.DRAM access times are 60-120ns at cost of $5 to $10 per Mbyte.Disk access times are 10 to 20 million ns at cost of $.10 to $.20 per Mbyte.
Try and give it to them anyway build a memory hierarchy
Exploiting Memory Hierarchy
1997
CPU
Level n
Level 2
Level 1
Levels in thememory hierarchy
Increasing distance from the CPU in
access time
Size of the memory at each level
24TM 2439v10 The ARM Architecture
The Memory Pyramid
25TM 2539v10 The ARM Architecture
Locality
A principle that makes having a memory hierarchy a good idea
If an item is referenced,
temporal locality: it will tend to be referenced again soon
spatial locality: nearby items will tend to be referenced soon.
Why does code have locality?
Our initial focus: two levels (upper, lower) block: minimum unit of data hit: data requested is in the upper level miss: data requested is not in the upper level
26TM 2639v10 The ARM Architecture
Two issues: How do we know if a data item is in the cache? If it is, how do we find it?
Our first example: block size is one word of data "direct mapped"
For each item of data at the lower level, there is exactly one location in the cache where it might be.
e.g., lots of items at the lower level share locations in the upper level
Cache
27TM 2739v10 The ARM Architecture
Direct Mapped Cache
64
Cache
Line
Index
CAM RAM
Cache Memory
64-way set-associative cache with I-Cache and D-Cache 16KB each
8words length per line with one valid bit and two dirty bits per line
Pseudo random or round robin replacement algorithm
Write-through or write-back cache operation to update the main memory
The write buffer can hold 16 words of data and four addresses.
28TM 2839v10 The ARM Architecture
Memory Interface
Memory Hierarchy
Memory Size and Speed
ARM MMU
Memory Interfacing
29TM 2939v10 The ARM Architecture
Storage Basics
CPU sees the RAM as one long, thin line of bytes
That doesn't mean that it's actually laid out that way
Real RAM chips don't store whole bytes, but rather they store individual bits in a grid, which you can address one bit at a time
30TM 3039v10 The ARM Architecture
SRAM Memory Timingfor Read Accesses
Address and chip select signals are provided tAA before data is available
Outputs reflect new data
2147H2147H High-Speed 4096x1-bit static RAM
A11-A0
DinWE CS
Dout
tRC = Read cycle time tAA = Address access time tACS = Chip select access time tHZ = Chip deselections to high Z out
old address
highimpedance
undef Data Valid
tRC
tAA
tACS
tHz
new addressAddressA11-A0
CS
WE
DoutAddress Bus
31TM 3139v10 The ARM Architecture
SRAM Memory Timing for Write Accesses
Address and data must be stable tS time-units before write enable signal falls
2147H2147H High-Speed 4096X1-bit static RAM
A11-A0
DinWE CS
Din
tS = Signal setup timetRC = Read cycle time tAA = Address access time tACS = Chip select access time tHZ = Chip deselections to high Z out
old address
old data new data
tWC
tAA
tACS
tHz
new addressAddressA11-A0
CS
WE
Din
tS
Address Bus
32TM 3239v10 The ARM Architecture
DRAM Organization and Operations
In the traditional DRAM, any storage location can be randomly accessed for read/write by inputting the address of the corresponding storage location.
A typical DRAM of bit capacity 2N * 2M consists of an array of memory cells arranged in 2N rows (word-lines) and 2M columns (bit-lines).
Each memory cell has a unique location represented by the intersection of word and bit line.
Memory cell consists of a transistor and a capacitor. The charge on the capacitor represents 0 or 1 for the memory cell. The support circuitry for the DRAM chip is used to read/write to a memory cell.
33TM 3339v10 The ARM Architecture
DRAM Organization and Operations
Address decoders to select a row and a column
Sense amps To detect and amplify the charge in the capacitor of the memory cell.
Read/Write logic To read/store information in the memory cell.
Output Enable logic Controls whether data should appear at the outputs.
Refresh counters To keep track of refresh sequence.
34TM 3439v10 The ARM Architecture
DRAM Memory Access
DRAM Memory is arranged in a XY grid pattern of rows and columns.
First, the row address is sent to the memory chip and latched, then the column address is sent in a similar fashion.
This row and column-addressing scheme (called multiplexing) allows a large memory address to use fewer pins.
The charge stored in the chosen memory cell is amplified using the sense amplifier and then routed to the output pin.
Read/Write is controlled using the read/write logic.
35TM 3539v10 The ARM Architecture
How DRAM Works
36TM 3639v10 The ARM Architecture
DRAM Memory Access
A typical DRAM read operation:1. The row address is placed on the address pins visa the address bus2. RAS pin is activated, which places the row address onto the Row
Address Latch.3. The Row Address Decoder selects the proper row to be sent to the sense
amps.4. The Write Enable is deactivated, so the DRAM knows that it’s not being
written to.5. The column address is placed on the address pins via the address bus6. The CAS pin is activated, which places the column address on the
Column Address Latch7. The CAS pin also serves as the Output Enable, so once the CAS signal
has stabilized, the sense amps place the data from the selected row and column on the Data Out pin so that it can travel the data bus back out into the system.
8. RAS and CAS are both deactivated so that the cycle can begin again.
37TM 3739v10 The ARM Architecture
DRAM Performance Specs
Important DRAM Performance Considerations Random access time: time required to read any random single cell Fast Page Cycle time: time required for page mode access read/write
to memory location on the most recently accessed page (no need to repeat RAS in this case)
Extended Data Out (EDO): allows setup of next address while current data access is maintained
SDRAM Burst Mode: Synchronous DRAMs use a self incrementing counter and a mode register to determine the column address sequence after the first memory location accessed on a page effective for applications that usually require streams of data from one or more pages on the DRAM
Required refresh rate: minimum rate of refreshes
38TM 3839v10 The ARM Architecture
Turning Bits
Into Bytes (2x This Picture)
39TM 3939v10 The ARM Architecture
Memory Interface
Memory Hierarchy
Memory Size and Speed
ARM MMU
Memory Interfacing
40TM 4039v10 The ARM Architecture
ARM MMU
Complex VM and protection mechanisms
Presents 4 GB address space (why?)
Memory granularity: 3 options supported 1MB sections Large pages (64 KBytes) access control within a large page on 16
KBytes Small pages (4 KBytes) access control within a large page on 1
Kbytes
Puts processor in Abort Mode when virtual address not mapped or permission check fails
Change pointer to page tables (called the translation table base, in ARM jargon) to change virtual address space useful for context switching of processes
41TM 4139v10 The ARM Architecture
Example: Single-Level Page Table
VirtualAddress
0111231
pagetable page
frame
data
value = y
y220
entries
32 bits
212
entries
Size of page table = 220 * 32 bits = 4 Mbytes
value = x
x
8 bits
Size of page = 212 * 8 bits = 4 Kbytes
42TM 4239v10 The ARM Architecture
Single-Level Page Table
Assumptions 32-bit virtual addresses 4 Kbyte page size = 212 bytes 32-bit address space
How many virtual page numbers? 232 / 212 = 220 = 1,048,576 virtual page numbers = number of entries in
the page table
If each page table entry occupies 4 bytes, how much memory is needed to store the page table? 220 entries * 4 bytes = 222 bytes = 4 Mbytes
43TM 4339v10 The ARM Architecture
Example: Two level Page Table
VirtualAddress
0111231 2122
pagedirectory
pagetable page
frame
data
value = zvalue = y
y210
entries
32 bits
32 bitsSize of page directory = 210 * 32 bits = 4 Kbytes
value = x
x
z
210
entries
Size of page table= 210 * 32 bits = 4 Kbytes
212
entries
8 bits
Size of page = 212 * 8 bits = 4 Kbytes
44TM 4439v10 The ARM Architecture
Two-Level Page Table
Assumptions 210 entries in page directory (= max number of page tables) 210 entries in page table 32 bits allocated for each page directory entry 32 bits allocated for each page table entry
How much memory is needed? Page table size = 210 entries * 32 bits = 212 bytes = 4 Kbytes Page directory size = 210 entries * 32 bits = 212 bytes = 4 Kbytes
45TM 4539v10 The ARM Architecture
Two-Level Page Table
Small (typical) system One page table might be enough
Page directory size + Page table size = 8 Kbytes of memory would suffice for virtual memory management
How much physical memory could this one page table handle? Number of page tables * Number of page table entries * Page size
= 1 * 210 * 212 bytes = 4 Mbytes
Large system You might need the maximum number of page tables
Max number of page tables * Page table size = 210 directory entries * 212 bytes = 222 bytes = 4 Mbytes of
memory would be needed for virtual memory management How much physical memory could these 210 page tables handle?
Number of page tables * Number of page table entries * Page size =
210 * 210 * 212 bytes = 4 Gbytes
46TM 4639v10 The ARM Architecture
Memory Interface
Memory Hierarchy
Memory Size and Speed
ARM MMU
Memory Interfacing
47TM 4739v10 The ARM Architecture
Interfacing External Memory
Little/Big Endian support
Address space: 4G bytes, (Differs in processor Implementation)
Supports programmable 8/16/32-bit data bus width for each bank
External address lines vary for a specific processor implementation
Programmable bank start address and bank size for bank 7
Eight memory banks: Memory banks for ROM, SRAM or Synchronous DRAM
Fully Programmable access cycles for all memory banks
Supports external wait signals to expend the bus cycle
Supports self-refresh mode in SDRAM for power down
Supports various types of ROM for booting (NOR/NAND Flash, EEPROM, and others)
The write buffer can hold 16 words of data and four addresses.
48TM 4839v10 The ARM Architecture
CPU Memory Interface
CPU Memory Interface usually consists of: uni directional address bus bi directional data bus read control line write control line ready control line size (byte, word) control line
Memory access involves a memory bus transaction read:
(1) set address, read and size, (2) copy data when ready is set by memory
write:(1) set address, data, write and size, (2) done when ready is set
address bus
data bus
CPU MemoryRead
Write
Ready
size
49TM 4939v10 The ARM Architecture
Memory Subsystem Components
Memory subsystems generally consist of chips+controller
Each chip provides few bits (e.g., 1 4) per access
Bits from multiple chips are accessed in parallel to fetch bytes and words
Memory controller decodes/translates address and control signals
Controller can also be on memory chip
Example: contains 8 16x1 bit chips and
very simple controller
address bus
data bus
CPU MemoryReadWriteReadySize
1-of-16decoder
1 0 1 1 0 0 1 01 0 0 0 0 0 0 1
0 1 0 1 0 0 1 1
address
00000001
1111
16x1-bit memory chip
16x8-bit memory array
D7 D6 D5 D4 D3 D2 D1 D0
50TM 5039v10 The ARM Architecture
EEPROM Interfacing
Memory Interface with 8-bit ROM
ARM MEMORY
A0 – A15 A0 – A15
D0 – D7 DQ0 – DQ7
WE WE
OE OE
GCS CE
Memory Interface with 8-bit ROM
51TM 5139v10 The ARM Architecture
Interfacing 8 - Bit Memory Banks
Memory Interface with 8-bit ROM x 2
52TM 5239v10 The ARM Architecture
Interfacing 16 - Bit Memory Banks
Memory Interface with 16-bit ROM x 2
Extra Signals
BE – Bank Enable
53TM 5339v10 The ARM Architecture
Interfacing Banked SDRAM
Memory Interface with 16-bit SDRAM x 2
54TM 5439v10 The ARM Architecture
Memory Interface with 16-bit SDRAM x 2
ARM SDRAM Signals Description
SCKE SCKE Clock Enable (high/Low)
SCLK SCLK System Clock
SCS0 SCS Chip Select
SRAS SRAS Row Address Strobe
SCAS SCAS Column Address Strobe
WE WE Write Enable
Signals in Interfacing SDRAM
55TM 5539v10 The ARM Architecture
Critical Thinking
It’s a commonly held belief that adding more RAM increases your performance. If you wanted to speed up your computer, what kind of RAM would you buy and why?
56TM 5639v10 The ARM Architecture
Agenda
Exceptions
System Design
Memory Interface
Synchronization
Input / Output
57TM 5739v10 The ARM Architecture
What is the Problem
Adding two array elements to another array element
LDR R0 A[0]
LDR R1 A[1]
ADD R2,R1,R0
STR R2 A[3]
Swapping the Variables
LDR R0 X
LDR R1 Y
STR R1 X
STR R2 Y
What to do ?????
58TM 5839v10 The ARM Architecture
The Solution
Adding two array elements to another array element
LDR R0 A[0]
LDR R1 A[1]
ADD R2,R1,R0
Bubble or other instructions
STR R2 A[3]
Swapping the Variables
LDR R0 X
LDR R1 Y
STR R0 Y
STR R1 X
That’s Synchronization
59TM 5939v10 The ARM Architecture
How to Achieve in ARM
SINGLE DATA SWAP (SWP)
[3:0] Source Register
[15:12] Destination Register
[19:16] Base Register
[22] Byte/Word Bit
0 = Swap word quantity
1 = Swap word quantity
[31:28] Condition Field
SWP R0,R1,[R2]
Load R0 with the word addressed by R2, and store R1 at R2.
SWPB R2,R3,[R4]
Load R2 with the byte addressed by R4, and store bits 0 to 7 of R3 at R4.
SWPEQ R0,R0,[R1]
Conditionally swap the contents of the word addressed by R1 with R0.
60TM 6039v10 The ARM Architecture
How to Achieve in ARM
The data swap instruction is used to swap a byte or word quantity between a register and external memory. This instruction is implemented as a memory read followed by a memory write which are “locked” together (the processor cannot be interrupted until both operations have completed, and the memory manager is warned to treat them as inseparable). This class of instruction is particularly useful for implementing software semaphores.
The swap address is determined by the contents of the base register (Rn). The processor first reads the contents of the swap address. Then it writes the contents of the source register (Rm) to the swap address, and stores the old memory contents in the destination register (Rd). The same register may be specified as both the source and destination.
The LOCK output goes HIGH for the duration of the read and write operations to signal to the external memory manager that they are locked together, and should be allowed to complete without interruption. This is important in multi-processor systems where the swap instruction is the only indivisible instruction which may be used to implement semaphores; control of the memory must not be removed from a processor while it is performing a locked operation.
61TM 6139v10 The ARM Architecture
Processor Independent Techniques
Semaphores
Mutual Exclusion
Message Ques
Pipes … etc
62TM 6239v10 The ARM Architecture
Agenda
Exceptions
System Design
Memory Interface
Synchronization
Input / Output
63TM 6339v10 The ARM Architecture
CPU Bus I/O
CPU needs to talk with I/O devices such as keyboard, mouse, video, network, disk drive, LEDs
Memory mapped I/O Devices are mapped to
specific memory locations just like RAM
Uses load/store instructions just like accesses to memory
Ported I/O Special bus line and
instructions
Address
CPU
Memory I/O Device
Data
Read
Write
CPU
MemoryI/O Device
Data
Read
Write
Address
I/O Port
Memory I/O
64TM 6439v10 The ARM Architecture
I/O Register Basics
I/O Registers are NOT like normal memory Device events can change their values (e.g., status registers) Reading a register can change its value (e.g., error condition reset)
so, for example, can't expect to get same value if read twice Some are read only (e.g., receive registers) Some are write only (e.g., transmit registers) Sometimes multiple I/O registers are mapped to same address
selection of one based on other info (e.g., read vs. write or extra control bits)
The bits in a control register often each specify something different and important and have significant side effects
Cache must be disabled for memory mapped addresses
When polling I/O registers, should tell compiler that value can change on its own volatile int *ptr;
65TM 6539v10 The ARM Architecture
Up Next - Bus Architectures
66TM 6639v10 The ARM Architecture
Bus Protocols
Protocol refers to the set of rules agreed upon by both the bus master and bus slave Synchronous bus transfers occur in relation to successive edges of a
clock Asynchronous bus transfers bear no particular timing relationship Semi synchronous bus Operations/control initiate asynchronously, but
data transfer occurs synchronously
CPU Device 1 Device 2 Device 3
Bus
67TM 6739v10 The ARM Architecture
Synchronous Bus Protocol
Transfer occurs in relation to successive edges of the system clock
Example: Memory address is placed on the address bus within a certain time, relative to
the rising edge of the clock By the trailing edge of this same clock pulse, the address information has had
time to stabilize, so the READ line is asserted Once the chip has been selected, then the memory can place the contents of
the specified location on the data bus
Clock
Address
Master (CPU) RD
Master (CPU) CS
Data
stable stable
stable stableunstable unstable
Instruction Addr Data Addr
I-fetch data
access time
decoding delay
68TM 6839v10 The ARM Architecture
Asynchronous Bus Protocol
No system clock used
Useful for systems where CPU and I/O devices run at different speeds
Example: Master puts address and
data on the bus and then raises the Master signal
Slave sees master signal, reads the data and then raises the Slave signal
Master sees Slave signal and lowers Master signal
Slave sees Master signal lowered and lowers Slave signal
write read
Address
Master
Slave
Data
there's somedata
I’vegot it
I see yougot it
I see yousee I got it
We call this exchange “handshaking”
69TM 6939v10 The ARM Architecture
Thank You
Any
Questions?