Unit 2_embedded system

PROCESSOR AND MEMORY ORGANIZATION [UNIT-II] V.V.C.E.T

Department of EEE Page 1

EMBEDDED SYSTEMS

DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING

UNIT - II

PROCESSOR AND MEMORY ORGANIZATION

� Structural units in a processor

� Selection of processor

� Memory devices

� DMA

� Memory management – Cache mapping techniques, dynamic allocation - Fragmentation

� Interfacing processor, Memory and I/O units.

CASE STUDY: Required Memory devices for an � Automatic Washing machine

� Chocolate vending machine

� Digital Camera and Voice recorder

Prepared by

M.Sujith,

Lecturer,

Department of Electrical and Electronics Engineering,

Vidyaa Vikas College of Engineering and Technology.

HOD/EEE



EMBEDDED SYSTEM PROCESSOR CHIP OR CORE

MICROPROCESSOR What is a Microprocessor? A microprocessor is a multipurpose, clock-driven, register-based electronic device that reads binary instructions from a storage device called memory accepts binary data as input and processes data according to the instructions given and provides results as output. Ex: 8085, 8086, Z80, 6800, Pentium processors etc Why do we use microprocessors? Arithmetic / Logic Unit Efficiently implements digital systems (executes programs very efficiently) Easier to design families of products and can be extended to meet rapidly changing market demands. (Optimized design in terms of achieving greater speed i.e. processing power)



MICROCONTROLLER What is a Microcontroller? A microcontroller is essentially an entire computer on a single chip. Ex: Intel’s 8051, 8096, Motorola M68HC11 XX / M68HC12XX, PIC 16XX series etc. Essential component of a control or Communication unit



Other Processors



STRUCTURAL UNITS IN A PROCESSOR

BUSES

1) Internal and external buses interconnect the processor internal units with the external system memories, I/O

devices and all other system elements

2) Address, data and control buses

MDR, MAR, BIU , PC and SP

3) MDR (memory data register) holds tMDR, MAR, BIU, PC and SP A and he accessed byte or word

4) MAR (memory address register) holds the address

5) BIU (Bus Interface Unit)

6) Program Counter or Instruction Pointer and

7) Stack Pointer Registers



8) ARS (Application Register Set): Set of on-chip registers for use in the application program. Register set ─

also called file and associates an ALU or FLPU.

9) Register window- a subset of registers with each subset storing static variables and status words of a task or

program thread. Changing windows help in fast context-switching in a program ALU, FLPU

10) ALU and FLPU (Arithmetic and Logic operations Unit and Floating Points operations Unit). FLPU

associates a FLP register set for operations.

CACHES

12) Instruction, Data and Branch Target Caches and associated PFCU (Prefetch control unit) for pre-fetching the

instructions, data and next branch target instructions, respectively.

Multi-way Cache – Example- 16 kB, 32-way Instruction cache with 32 byte block for data and 16 kB in ARM

Cache block – Enables simultaneous caching of several memory locations of a set of instructions

AOU

13) AOU (Atomic Operations Unit ) An instruction is broken into number of processor-instructions called

atomic operations (AOs), AOU finishes the AOs before an interrupt of the process occurs - Prevents problems

arising out of incomplete processor operations on the shared data in the programs

FEATURES IN MOST PROCESSORS

� Fixed Instruction Cycle Time ─ RISC processor core

� 32-bit Internal Bus Width– to facilitate the availability of arithmetic operations on 32- bit operands in a

single cycle. The 32-bit bus – a necessity for signal processing and control system instructions.

� Program-Counter (PC) bits and its reset value

� Stack-Pointer bits with and its initial reset value

Instruction, Branch Target and Data Cache

• Memory-Management unit (MMU)

• Floating Point Processing unit

• System Register Set

Floating Point Register Set

� Pre-fetch Control Unit for data into the I and D-caches

� Instruction level parallelism units

(i) multistage pipeline



(ii) Multi-line superscalar processing

Executing RISC architecture most instructions on in a single clock cycle execution per instruction (by

hardwired implementation of instructions)

� Using multiple register-sets or register windows or files and

� Greatly reducing ALU dependency on the external memory accesses for data due to the

reduced number of addressing modes provided RISC Load and store architecture. Before ALU

operations, the operands are loaded into the registers and similarly the write back result is in the register

and then stored at the external memory addresses

CYCLES

� On cycle 1, the first instruction I0 enters the instruction fetch (IF) stage of the pipeline and

stops at pipeline latch (buffer) between instruction fetch and instruction decode (ID) stage of the

pipeline.

� On cycle 2, the second instruction I1 enters the instruction fetch stage, while instruction I

proceeds to instruction decode stage.

� On cycle 3 the instruction I2 enters the register (inputs) read (RR) stage, instruction I1 is in the

instruction decode stage, and instruction I2 enters instruction fetch stage.

� Instructions proceed through the pipeline at one stage per cycle until they reach the register (result)

write-back (WB) stage, at which point execution of the instruction I0 is complete.

� On cycle 6 in the example, instructions I1 through I5 are in the pipeline, while instruction I0 has

completed and is no longer in the pipeline.

� The pipelined processor is still executing instructions at a rate (throughput) of one instruction



� per cycle, but the latency of each instruction is now 5 cycles instead of 1. But each cycle period is now

1/5 or less compared to the case without pipelining.

Thus processing performance can improve or more times in five stage pipe line .

Instruction level parallelism (ILP)

• Execute several instructions is parallel. Two or more instructions execute in parallel as well as in

pipeline.

• During the in which two parallel pipelines in a processor and two instructions In and In+1 executing in

parallel at the separate execution units



PROCESSOR PERFORMANCE

Performance of a processor is measured in terms of following metrics: MIPS: It is the measure of processing speed of a processor in million instructions per sec. MFLOPS: It is the measure of processing speed of a processor or DSP in million floating point operations per second Dhrystone per second: It is a benchmarking program developed by Reinhold P. Weicker in 1984 that measures processor’s performance for processing integers and strings. The benchmark program is available in C, Pascal or Java and benchmarks a CPU and not the performance of IO or OS calls. This metric measures the number of times the program can run in a second.

1 MIPS = 1757 Dhrystone / sec



The EDN Embedded Benchmark Consortium (EEMBC) proposed five-benchmark program suites for 6

different areas of application of embedded systems: - Telecommunication (modems, xDSL) - Consumer Electronics (digital cameras) - Automotive and Industrial Electronics - Networking (Networking processors) - Office Automation (printers, plotters) - Digital Entertainment (PDA, cell phone)

ESSENTIAL CHARACTERISTICS OF PROCESSORSTRUCTURE

Superscalar ProcessingSuperscalar ProcessingSuperscalar ProcessingSuperscalar Processing � A superscalar processor has the capacity to fetch (instructions from memory), decode (instructions) and

execute more than one instructions in parallel at any instant. � Superscaling allows ( two or more) instructions to be processed in parallel (full overlapping). � Multiple units are provided for instruction processing. � Supports pipelining � Power PC MPC 601 (RISC, first PowerPC, 66 MHz, 132 MIPS)

- 3 execution units - 1 branch unit (branching) - 1 integer unit - 1 floating point unit - can dispatch up to 2 instructions and process 3 every clock cycle



In Pentium two 5-stage pipelines are there to execute two instructions per clock cycle. Whereas Pentium II has a single stage pipeline but multiple functional units.

MICROCODE AND HARDWIRED

Microcode: Inside a CPU, the instructions are decoded to a sequence of microcode instructions, which in turn calls a sequence of nanocode commands which controls the sequences and ALU. The instructions do not operate directly on the internal resources. Neither the microcode nor the nanocode are available to the programmer. Steps (known as microcodes) for processing an instruction in the CPU involves following:

o Instruction fetch from memory (IF) o Decode instruction (ID) o Load operands from memory (OL) o Execute instruction (EX) o Store results in memory (OS)

Note: (1) The fastest instruction will have all operands in CPU (in RISC) such that single clock cycle is needed to execute them. (2) Microcoding requires multiple cycles to load sequencers etc. and therefore cannot be easily used to implement single-cycle execution unit. Hardwired: In some processors (RISC) all the execution units are hardwired i.e. instructions are directly executed by hardware and there won’t be any micro coding for processing. Hence instructions will be executed in a single cycle. Pipelining � Pipelining means dividing the ALU circuit into n substages.

� All common steps (IF, ID, OL, EX, OS) involved in instruction processing by the CPU can be pipelined.

� Each major step of the instruction processing is assigned to and handled independently by a separate

subunit of the CPU pipeline.

� Pipeline Stall is a disadvantage of pipelining and is caused when any stage within the pipeline cannot

complete its allotted task at the same time as its peers. This can occur when

(i) wait states are inserted into external memory access



(ii) instructions use iterative techniques or

(iii) there is a change in program flow (due to branching etc.).

� Branch Penalty: It is the time required for re-processing the instructions which had become redundant

(executed in part at preceding stages) due to the execution of a branching instruction in a multistage

pipeline.

� Data Dependency Penalty: It is the waiting time by an instruction for further execution when it is dependent on the data output of the other instruction. This happens due to improper alignment of both the instructions.

CACHING

Caches are small, fast memory that holds copies of some of the contents of main memory. They provide higher-speed access for the CPU. A cache controller meditates between the CPU and the main memory � Cache hit: if the requested location is available in the cache � Cache miss: if the requested location is not available in the cache resulting in cache miss penalty (extra

time needed to access the missing memory location).



Cache miss can occur due to various reasons / situations

� Compulsory miss (cold miss): the first time a location is used (not referenced before)

� Capacity miss: the program’s working set is too large for the cache _ Conflict miss: two particular

memory locations are fighting for the same cache line.Behavior of several programs running concurrently

must be examined to accurately estimate performance.

CPU POWER CONSUMPTION

� Power: energy consumption per unit time. more consumption -> more heat generation

� battery life -> depends on energy consumption

� power -> energy and power consumption

� CMOS Circuits: used to build all kinds of digital systems - Voltage drops: power consumption

proportional to V2 (reduce power supply)

o Toggling: more power consumed when changing states (o/p value). So to reduce consumption

reduce the circuit’s operating speed, and unnecessary changes to the inputs of a CMOS circuit

(will eliminate unnecessary glitches at the output)

o Leakage: some charge leaks through the substrate even in inactive state of the CMOS circuit.

(remove power supply -> more time needed to reconnect the supply)

Power Saving Strategies

� Use the CPU at reduced voltage levels (e.g reducing the supply from 5V to 3.3V will reduce power

consumption by 52 / 3.32 = 2.29.

� Operate the CPU at lower clock rates -> may reduce power consumption (but not energy consumption).

� Disable certain functional units that are not currently needed (reduces energy consumption)

� Allow part of the CPU to be totally disconnected from the power supply (eliminates leakage current)



� Static power management: invoked by the user like power-down mode activation by executing an

instruction. To come out of this mode an interrupt or any other even is needed. No instruction is available

for exiting from this mode.

� Dynamic power management: done by the dynamic activity of the CPU like turning off certain sections

of the CPU when the currently executing instruction do not need that particular unit or section.

SELECTING PROCESSORS FOR EMBEDDED APPLICATIONS

• Instruction set

• Maximum bits in an operand (8 or 16 or 32) in a single arithmetic or logical operation

• Clock frequency in MHz

• Processing speed in MIPS / MFLOPS / Dhrystone

• Processors ability to solve complex algorithms to meet deadlines

================================================================================

Processor Organisation

Processor • ALU.

• Processor circuit does sequential operations and a clock guides these.

• Program counter and stack pointer, which points to the instruction to be fetched and top of the data

pushed into the stack.

• Certain processor have on-chip memory management unit (MMU).

Registers

• General-purpose registers.

• Registers organize onto a common internal bus of the processor. A register is of 32, 16 or 8 bits

depending on whether the ALU performs at an instance a 32- or 16- or 8-bit operation

CISC

• Processor may have CISC (Complex Instruction Set Computer) or RISC (Reduced Instruction Set

Computer) architecture may affect the system design.

• CISC has ability to process complex instructions and complex data sets with fewer registers as it provides

for a large number of addressing modes.



RISC

• Simpler instructions and all in a single cycle per instruction.

• New RISC processors, such as ARM 7 and ARM9 also provide for a few most useful CISC instructions

also.

• CISC converges to a RISC implementation because the most instructions are hardwired and implement in

single clock cycle

Interrupts

• Processor provides for the inputs for external interrupts so that the external circuits can send the interrupt

signals

• May possess an internal interrupt controller (handler) to program the service routine priorities and to

allocate vector addresses.

Memory

Most of the modern computer system has been designed on the basis of an architecture called Von-Neumann

Architecture1

1=The so-called von Neumann architecture is a model for a computing machine that uses a single storage structure to hold both the set of instructions on how to perform the computation and the data required or generated by the computation. Such machines are also known as stored-program computers. The separation of storage from the processing unit is implicit in this model. By treating the instructions in the same way as the data, a stored-program machine can easily change the instructions. In other words the machine is reprogrammable. One important motivation for such a facility was the need for a program to increment or otherwise modify the address portion of instructions. This became less important when index registers and indirect addressing became customary features of machine architecture.

The Memory stores the instructions as well as data. No one can distinguish an instruction and data. The CPU has to be directed to the address of the instruction codes.



The memory is connected to the CPU through the following lines

1. Address

2. Data

3. Control

In a memory read operation the CPU loads the address onto the address bus. Most cases these lines are fed to a decoder which selects the proper memory location. The CPU then sends a read control signal. The data is stored in that location is transferred to the processor via the data lines.

In the memory write operation after the address is loaded the CPU sends the write control signal followed by the data to the requested memory location.

The memory can be classified in various ways i.e. based on the location, power consumption, way of data storage etc The memory at the basic level can be classified as

1. Processor Memory (Register Array)

2. Internal on-chip Memory

3. Primary Memory

4. Cache Memory

5. Secondary Memory

Processor Memory (Register Array) Most processors have some registers associated with the arithmetic logic units. They store the operands and the result of an instruction. The data transfer rates are much faster without needing any additional clock cycles. The number of registers varies from processor to processor. The more is the number the faster is the instruction execution. But the complexity of the architecture puts a limit on the amount of the processor memory.



Internal on-chip Memory In some processors there may be a block of memory location. They are treated as the same way as the external memory. However it is very fast. Primary Memory This is the one which sits just out side the CPU. It can also stay in the same chip as of CPU. These memories can be static or dynamic. Cache Memory This is situated in between the processor and the primary memory. This serves as a buffer to the immediate instructions or data which the processor anticipates. There can be more than one levels of cache memory. Secondary Memory These are generally treated as Input/Output devices. They are much cheaper mass storage and slower devices connected through some input/output interface circuits. They are generally magnetic or optical memories such as Hard Disk and CDROM devices. The memory can also be divided into Volatile and Non-volatile memory. Volatile Memory The contents are erased when the power is switched off. Semiconductor Random Access Memories fall into this category. Non-volatile Memory The contents are intact even of the power is switched off. Magnetic Memories (Hard Disks), Optical Disks (CDROMs), Read Only Memories (ROM) fall under this category.



Data Storage An m word memory can store m x n: m words of n bits each. One word is located at one address therefore to address m words we need.

k = Log2(m) address input signals

or k number address lines can address m = 2k words

Example 4,096 x 8 memory:

• 32,768 bits • 12 address input signals • 8 input/output data signals

Memory access The memory location can be accessed by placing the address on the address lines. The control lines read/write selects read or write. Some memory devices are multi-port i.e. multiple accesses to different locations simultaneously



Memory Specifications The specification of a typical memory is as follows

The storage capacity: The number of bits/bytes or words it can store

The memory access time (read access and writes access): How long the memory takes to load the data on

to its data lines after it has been addressed or how fast it can store the data upon supplied through its data lines.

This reciprocal of the memory access time is known as Memory

Bandwidth The Power Consumption and Voltage Levels: The power consumption is a major factor in embedded

systems. The lesser is the power consumption the more is packing density.

Size: Size is directly related to the power consumption and data storage capacity.

Four generation of RAM chips

There are two important specifications for the Memory as far as Real Time Embedded Systems are concerned. – Write Ability – Storage Performance



Write ability It is the manner and speed that a particular memory can be written • Ranges of write ability

– High end • processor writes to memory simply and quickly e.g., RAM

– Middle range • processor writes to memory, but slower e.g., FLASH, EEPROM (Electrically Erasable and

Programmable Read Only Memory) – Lower range

• special equipment, “programmer”, must be used to write to memory e.g., EPROM, OTP ROM (One Time Programmable Read Only Memory)

– Low end • bits stored only during fabrication e.g., Mask-programmed ROM

• In-system programmable memory – Can be written to by a processor in the embedded system using the memory – Memories in high end and middle range of write ability

Storage permanence It is the ability to hold the stored bits. Range of storage permanence

– High end • essentially never loses bits • e.g., mask-programmed ROM

-- Middle range

• holds bits days, months, or years after memory’s power source turned off • e.g., NVRAM

– Lower range • holds bits as long as power supplied to memory • e.g., SRAM

– Low end • begins to lose bits almost immediately after written • e.g., DRAM

Nonvolatile memory – Holds bits after power is no longer supplied – High end and middle range of storage permanence

Common Memory Types Read Only Memory (ROM) This is a nonvolatile memory. It can only be read from but not written to, by a processor in an embedded system. Traditionally written to, “programmed”, before inserting to embedded system



Uses

– Store software program for general-purpose processor

• program instructions can be one or more ROM words

– Store constant data needed by system

– Implement combinational circuit

EPROM: Erasable programmable ROM This is known as erasable programmable read only memory. The programmable component is a MOS transistor.

This transistor has a “floating” gate surrounded by an insulator. The Negative charges form a channel between

source and drain storing a logic 1. The Large positive voltage at gate causes negative charges to move out of

channel and get trapped in floating gate storing a logic 0. The (Erase) Shining UV rays on surface of floating-

gate causes negative charges to return to channel from floating gate restoring the logic 1. An EPROM package

showing quartz window through which UV light can pass. The EPROM has

• Better write ability

– can be erased and reprogrammed thousands of times

• Reduced storage permanence

– program lasts about 10 years but is susceptible to radiation and electric noise

• Typically used during design development

EEPROM EEPROM is otherwise known as Electrically Erasable and Programmable Read Only Memory. It is erased typically by using higher than normal voltage. It can program and erase individual words unlike the EPROMs where exposure to the UV light erases everything. It has



• Better write ability

– can be in-system programmable with built-in circuit to provide higher than normal voltage • built-in memory controller commonly used to hide details from memory user

– writes very slow due to erasing and programming • “busy” pin indicates to processor EEPROM still writing

– can be erased and programmed tens of thousands of times • Similar storage permanence to EPROM (about 10 years) • Far more convenient than EPROMs, but more expensive

Flash Memory It is an extension of EEPROM. It has the same floating gate principle and same write ability and storage

permanence. It can be erased at a faster rate i.e. large blocks of memory erased at once, rather than one word at a

time. The blocks are typically several thousand bytes large

• Writes to single words may be slower

– Entire block must be read, word updated, then entire block written back

• Used with embedded systems storing large data items in nonvolatile memory

– e.g., digital cameras, TV set-top boxes, cell phones

RAM: “Random-access” memory • Typically volatile memory

– bits are not held without power supply

• Read and written to easily by embedded system during execution

• Internal structure more complex than ROM

– a word consists of several memory cells, each storing 1 bit

– each input and output data line connects to each cell in its column

– rd/wr connected to every cell

– when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write

or outputs stored bit when rd/wr indicates read



Basic types of RAM

• SRAM: Static RAM

– Memory cell uses flip-flop to store bit

– Requires 6 transistors

– Holds data as long as power supplied

• DRAM: Dynamic RAM

– Memory cell uses MOS transistor and capacitor to store bit



– More compact than SRAM

– “Refresh” required due to capacitor leak

• word’s cells refreshed when read

– Typical refresh rate 15.625 microsec.

– Slower to access than SRAM

Ram variations • PSRAM: Pseudo-static RAM

– DRAM with built-in memory refresh controller

– Popular low-cost high-density alternative to SRAM

• NVRAM: Nonvolatile RAM

– Holds data after external power removed

– Battery-backed RAM

• SRAM with own permanently connected battery

• writes as fast as reads

• no limit on number of writes unlike nonvolatile ROM-based memory

– SRAM with EEPROM or flash stores complete RAM contents on EEPROM or flash before power

Composing memory

• Memory size needed often differs from size of readily available memories

• When available memory is larger, simply ignore unneeded high-order address bits and higher data lines

• When available memory is smaller, compose several smaller memories into one larger memory

– Connect side-by-side to increase width of words

– Connect top to bottom to increase number of words

• added high-order address line selects smaller memory containing desired word using a decoder

– Combine techniques to increase number and width of words





Memory Hierarchy Objective is to use inexpensive, fast memory

• Main memory

� Large, inexpensive, slow memory stores entire program and data

• Cache

� Small, expensive, fast memory stores copy of likely accessed parts of larger memory

� Can be multiple levels of cache

Cache

• Usually designed with SRAM

� faster but more expensive than DRAM

• Usually on same chip as processor

� space limited, so much smaller than off-chip main memory

� faster access (1 cycle vs. several cycles for main memory)

• Cache operation

� Request for main memory access (read or write)



� First, check cache for copy

� cache hit

- copy is in cache, quick access

� cache miss

- copy not in cache, read address and possibly its neighbors into cache

• Several cache design choices

� cache mapping, replacement policies, and write techniques

Cache Mapping

• is necessary as there are far fewer number of available cache addresses than the memory

• Are address’ contents in cache?

• Cache mapping used to assign main memory address to cache address and determine hit or miss

• Three basic techniques:

� Direct mapping

� Fully associative mapping

� Set-associative mapping

• Caches partitioned into indivisible blocks or lines of adjacent memory addresses

� usually 4 or 8 addresses per line

DIRECT MAPPING

• Main memory address divided into 2 fields

� Index which contains

- cache address

- number of bits determined by cache size

� Tag

- compared with tag stored in cache at address indicated by index

- if tags match, check valid bit

• Valid bit

� indicates whether data in slot has been loaded from memory

• Offset

� used to find particular word in cache line



Fully Associative Mapping

• Complete main memory address stored in each cache address

• All addresses stored in cache simultaneously compared with desired address

• Valid bit and offset same as direct mapping

Set-Associative Mapping • Compromise between direct mapping and fully associative mapping

• Index same as in direct mapping



• But, each cache address contains content and tags of 2 or more memory address locations

• Tags of that set simultaneously compared as in fully associative mapping

• Cache with set size N called N-way set-associative

� 2-way, 4-way, 8-way are common

Cache-Replacement Policy • Technique for choosing which block to replace

� when fully associative cache is full

� when set-associative cache’s line is full

• Direct mapped cache has no choice

• Random

� replace block chosen at random

• LRU: least-recently used

� replace block not accessed for longest time

• FIFO: first-in-first-out

� push block onto queue when accessed

� choose block to replace by popping queue



Cache Write Techniques

• When written, data cache must update main memory

• Write-through

� write to main memory whenever cache is written to

� easiest to implement

� processor must wait for slower main memory write

� potential for unnecessary writes

• Write-back

� main memory only written when “dirty” block replaced

� extra dirty bit for each block set when cache block written to

� reduces number of slow main memory writes

Cache Impact on System Performance

• Most important parameters in terms of performance:

� Total size of cache

- total number of data bytes cache can hold

- tag, valid and other house keeping bits not included in total

� Degree of associativity

� Data block size

• Larger caches achieve lower miss rates but higher access cost

� e.g.,

- 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles

- avg. cost of memory access

= (0.85 * 2) + (0.15 * 20) = 4.7 cycles

• 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change

- avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles (improvement)

• 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change

- avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles

Cache Performance Trade-Offs • Improving cache hit rate without increasing size

� Increase line size � Change set-associativity



Advanced RAM • DRAMs commonly used as main memory in processor based embedded systems

� high capacity, low cost

• Many variations of DRAMs proposed

� need to keep pace with processor speeds

� FPM DRAM: fast page mode DRAM

� EDO DRAM: extended data out DRAM

� SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM

� RDRAM: rambus DRAM

Basic DRAM

• Address bus multiplexed between row and column components

• Row and column addresses are latched in, sequentially, by strobing ras (row address strobe) and cas

(column address strobe) signals, respectively

• Refresh circuitry can be external or internal to DRAM device

� strobes consecutive memory address periodically causing memory content to be refreshed

� Refresh circuitry disabled during read or write operation



Fast Page Mode DRAM (FPM DRAM)

• Each row of memory bit array is viewed as a page

• Page contains multiple words

• Individual words addressed by column address

• Timing diagram:

� row (page) address sent

� 3 words read consecutively by sending column address for each

Extra cycle eliminated on each read/write of words from same



Extended data out DRAM (EDO DRAM)

• Improvement of FPM DRAM

• Extra latch before output buffer

� allows strobing of cas before data read operation completed

• Reduces read/write latency by additional cycle

(S)ynchronous and Enhanced Synchronous (ES) DRAM

• SDRAM latches data on active edge of clock

• Eliminates time to detect ras/cas and rd/wr signals

• A counter is initialized to column address then incremented on active edge of clock to access consecutive

memory locations

• ESDRAM improves SDRAM

� added buffers enable overlapping of column addressing

� faster clocking and lower read/write latency possible



Rambus DRAM (RDRAM)

• More of a bus interface architecture than DRAM architecture

• Data is latched on both rising and falling edge of clock

• Broken into 4 banks each with own row decoder

� can have 4 pages open at a time

• Capable of very high throughput

DRAM Integration Problem

• SRAM easily integrated on same chip as processor

• DRAM more difficult

� Different chip making process between DRAM and conventional logic

� Goal of conventional logic (IC) designers:

- minimize parasitic capacitance to reduce signal propagation delays and power consumption

� Goal of DRAM designers:

- create capacitor cells to retain stored information

� Integration processes beginning to appear

Memory Management Unit (MMU)

• Duties of MMU

� Handles DRAM refresh, bus interface and arbitration

� Takes care of memory sharing among multiple processors

� Translates logic memory addresses from processor to physical memory addresses of DRAM

• Modern CPUs often come with MMU built-in

• Single-purpose processors can be used

DMA Controller Introduction

Direct Memory Access (DMA) allows devices to transfer data without subjecting the processor a heavy

overhead. Otherwise, the processor would have to copy each piece of data from the source to the destination.

This is typically slower than copying normal blocks of memory since access to I/O devices over a peripheral bus

is generally slower than normal system RAM. During this time the processor would be unavailable for any other

tasks involving processor bus access. But it can continue to work on any work which does not require bus access.



DMA transfers are essential for high performance embedded systems where large chunks of data need to be

transferred from the input/output devices to or from the primary memory.

DMA Controller

A DMA controller is a device, usually peripheral to a CPU that is programmed to perform a sequence of

data transfers on behalf of the CPU. A DMA controller can directly access memory and is used to transfer data

from one memory location to another, or from an I/O device to memory and vice versa. A DMA controller

manages several DMA channels, each of which can be programmed to perform a sequence of these DMA

transfers. Devices, usually I/O peripherals, that acquire data that must be read (or devices that must output data

and be written to) signal the DMA controller to perform a DMA transfer by asserting a hardware DMA request

(DRQ) signal. A DMA request signal for each channel is routed to the DMA controller. This signal is monitored

and responded to in much the same way that a processor handles interrupts. When the DMA controller sees a

DMA request, it responds by performing one or many data transfers from that I/O device into system memory or

vice versa. Channels must be enabled by the processor for the DMA controller to respond to DMA requests. The

number of transfers performed, transfer modes used, and memory locations accessed depends on how the DMA

channel is programmed. A DMA controller typically shares the system memory and I/O bus with the CPU and

has both bus master and slave capability. Fig. shows the DMA controller architecture and how the DMA

controller interacts with the CPU. In bus master mode, the DMA controller acquires the system bus (address,

data, and control lines) from the CPU to perform the DMA transfers. Because the CPU releases the system bus

for the duration of the transfer, the process is sometimes referred to as cycle stealing.

In bus slave mode, the DMA controller is accessed by the CPU, which programs the DMA controller's

internal registers to set up DMA transfers. The internal registers consist of source and destination address

registers and transfer count registers for each DMA channel, as well as control and status registers for initiating,

monitoring, and sustaining the operation of the DMA controller.



DMA ACHITECTURE



DMA Transfer Types and Modes DMA controllers vary as to the type of DMA transfers and the number of DMA channels they support. The two

types of DMA transfers are flyby DMA transfers and fetch-and-deposit DMA transfers. The three common

transfer modes are single, block, and demand transfer modes. These DMA transfer types and modes are

described in the following paragraphs. The fastest DMA transfer type is referred to as a single-cycle, single-

address, or flyby transfer. In a flyby DMA transfer, a single bus operation is used to accomplish the transfer,

with data read from the source and written to the destination simultaneously. In flyby operation, the device

requesting service asserts a DMA request on the appropriate channel request line of the DMA controller. The

DMA controller responds by gaining control of the system bus from the CPU and then issuing the pre-

programmed memory address. Simultaneously, the DMA controller sends a DMA acknowledge signal to the

requesting device. This signal alerts the requesting device to drive the data onto the system data bus or to latch

the data from the system bus, depending on the direction of the transfer. In other words, a flyby DMA transfer

looks like a memory read or write cycle with the DMA controller supplying the address and the I/O device

reading or writing the data. Because flyby DMA transfers involve a single memory cycle per data transfer, these

transfers are very efficient. Fig. shows the flyby DMA transfer signal protocol.

The second type of DMA transfer is referred to as a dual-cycle, dual-address, flow-through, or

fetch-and-deposit DMA transfer. As these names imply, this type of transfer involves two memory or I/O cycles.

The data being transferred is first read from the I/O device or memory into a temporary data register internal to

the DMA controller. The data is then written to the memory or I/O device in the next cycle. FIG .shows the



fetch-and-deposit DMA transfer signal protocol. Although inefficient because the DMA controller performs two

cycles and thus retains the system bus longer, this type of transfer is useful for interfacing devices with different

data bus sizes. For example, a DMA controller can perform two 16-bit read operations from one location

followed by a 32-bit write operation to another location. A DMA controller supporting this type of transfer has

two address registers per channel (source address and destination address) and bus-size registers, in addition to

the usual transfer count and control registers.

Unlike the flyby operation, this type of DMA transfer is suitable for both memory-to-memory and I/O transfers.

Single, block, and demand are the most common transfer modes. Single transfer mode transfers one data

value for each DMA request assertion. This mode is the slowest method of transfer because it requires the DMA

controller to arbitrate for the system bus with each transfer. This arbitration is not a major problem on a lightly

loaded bus, but it can lead to latency problems when multiple devices are using the bus. Block and demand

transfer modes increase system throughput by allowing the DMA controller to perform multiple DMA transfers

when the DMA controller has gained the bus. For block mode transfers, the DMA controller performs the entire

DMA sequence as specified by the transfer count register at the fastest possible rate in response to a single DMA

request from the I/O device. For demand mode transfers, the DMA controller performs DMA transfers at the

fastest possible rate as long as the I/O device asserts its DMA request. When the I/O device unasserts this DMA

request, transfers are held off.



DMA Controller Operation

For each channel, the DMA controller saves the programmed address and count in the base registers and

maintains copies of the information in the current address and current count registers, as shown in Fig.16.1. Each

DMA channel is enabled and disabled via a DMA mask register. When DMA is started by writing to the base

registers and enabling the DMA channel, the current registers are loaded from the base registers. With each

DMA transfer, the value in the current address register is driven onto the address bus, and the current address

register is automatically incremented or decremented. The current count register determines the number of

transfers remaining and is automatically decremented after each transfer. When the value in the current count

register goes from 0 to -1, a terminal count (TC) signal is generated, which signifies the completion of the DMA

transfer sequence. This termination event is referred to as reaching terminal count. DMA controllers often

generate a hardware TC pulse during the last cycle of a DMA transfer sequence. This signal can be monitored by

the I/O devices participating in the DMA transfers. DMA controllers require reprogramming when a DMA

channel reaches TC. Thus, DMA controllers require some CPU time, but far less than is required for the CPU to

service device I/O interrupts. When a DMA channel reaches TC, the processor may need to reprogram the

controller for additional DMA transfers. Some DMA controllers interrupt the processor whenever a channel

terminates. DMA controllers also have mechanisms for automatically reprogramming a DMA channel when the

DMA transfer sequence completes. These mechanisms include auto initialization and buffer chaining. The auto

initialization feature repeats the DMA transfer sequence by reloading the DMA channel's current registers from

the base registers at the end of a DMA sequence and re-enabling the channel. Buffer chaining is useful for

transferring blocks of data into noncontiguous buffer areas or for handling double-buffered data acquisition.

With buffer chaining, a channel interrupts the CPU and is programmed with the next address and count

parameters while DMA transfers are being performed on the current buffer. Some DMA controllers minimize

CPU intervention further by having a chain address register that points to a chain control table in memory. The

DMA controller then loads its own channel parameters from memory. Generally, the more sophisticated the

DMA controller, the less servicing the CPU has to perform.

A DMA controller has one or more status registers that are read by the CPU to determine the state of

each DMA channel. The status register typically indicates whether a DMA request is asserted on a channel and

whether a channel has reached TC. Reading the status register often clears the terminal count information in the

register, which leads to problems when multiple programs are trying to use different DMA channels.

Steps in a Typical DMA cycle

Device wishing to perform DMA asserts the processors bus request signal.

1. Processor completes the current bus cycle and then asserts the bus grant signal to the device.

2. The device then asserts the bus grant ack signal.



3. The processor senses in the change in the state of bus grant ack signal and starts listening to the data and

address bus for DMA activity.

4. The DMA device performs the transfer from the source to destination address.

5. During these transfers, the processor monitors the addresses on the bus and checks if any location

modified during DMA operations is cached in the processor. If the processor detects a cached address on

the bus, it can take one of the two actions:

o Processor invalidates the internal cache entry for the address involved in DMA write operation

o Processor updates the internal cache when a DMA write is detected

6. Once the DMA operations have been completed, the device releases the bus by asserting the bus release

signal.

7. Processor acknowledges the bus release and resumes its bus cycles from the point it left off.



Signal Description

VCC: is the +5V power supply pin

GND Ground

CLK: CLOCK INPUT: The Clock Input is used to generate the timing signals which control 82C37A

operations.

CS: CHIP SELECT: Chip Select is an active low input used to enable the controller onto the data bus for CPU

communications.

RESET: This is an active high input which clears the Command, Status, Request, and Temporary registers, the

First/Last Flip-Flop, and the mode register counter. The Mask register is set to ignore requests. Following a

Reset, the controller is in an idle cycle.

READY: This signal can be used to extend the memory read and write pulses from the 82C37A to accommodate

slow memories or I/O devices.

HLDA: HOLD ACKNOWLEDGE: The active high Hold Acknowledge from the CPU indicates that it has

relinquished control of the system busses.

DREQ0-DREQ3: DMA REQUEST: The DMA Request (DREQ) lines are individual asynchronous channel

request inputs used by peripheral circuits to obtain DMA service. In Fixed Priority, DREQ0 has the highest

priority and DREQ3 has the lowest priority. A request is generated by activating the DREQ line of a channel.

DACK will acknowledge the recognition of a DREQ signal. Polarity of DREQ is programmable. RESET

initializes these lines to active high. DREQ must be maintained until the corresponding DACK goes active.

DREQ will not be recognized while the clock is stopped. Unused DREQ inputs should be pulled High or Low

(inactive) and the corresponding mask bit set.

DB0-DB7: DATA BUS: The Data Bus lines are bidirectional three-state signals connected to the system data

bus. The outputs are enabled in the Program condition during the I/O Read to output the contents of a register to

the CPU. The outputs are disabled and the inputs are read during an I/O Write cycle when the CPU is

programming the 82C37A control registers. During DMA cycles, the most significant 8-bits of the address are

output onto the data bus to be strobed into an external latch by ADSTB. In memory-to-memory operations, data

from the memory enters the 82C37A on the data bus during the read-from-memory transfer, then during the

write-to-memory transfer, the data bus outputs write the data into the new memory location.

IOR: READ: I/O Read is a bidirectional active low three-state line. In the Idle cycle, it is an input control signal

used by the CPU to read the control registers. In the Active cycle, it is an output control signal used by the

82C37A to access data from the peripheral during a DMA Write transfer.



IOW: WRITE: I/O Write is a bidirectional active low three-state line. In the Idle cycle, it is an input control

signal used by the CPU to load information into the 82C37A. In the Active cycle, it is an output control signal

used by the 82C37A to load data to the peripheral during a DMA Read transfer.

EOP: END OF PROCESS: End of Process (EOP) is an active low bidirectional signal. Information concerning

the completion of DMA services is available at the bidirectional EOP pin. The 82C37A allows an external signal

to terminate an active DMA service by pulling the EOP pin low. A pulse is generated by the 82C37A when

terminal count (TC) for any channel is reached, except for channel 0 in memory-to-memory mode. During

memory-to-memory transfers, EOP will be output when the TC for channel 1 occurs. The EOP pin is driven by

an open drain transistor on-chip, and requires an external pull-up resistor to VCC. When an EOP pulse occurs,

whether internally or externally generated, the 82C37A will terminate the service, and if auto-initialize is

enabled, the base registers will be written to the current registers of that channel. The mask bit and TC bit in the

status word will be set for the currently active channel by EOP unless the channel is programmed for

autoinitialize. In that case, the mask bit remains clear.

A0-A3: ADDRESS: The four least significant address lines are bidirectional three-state signals. In the Idle cycle,

they are inputs and are used by the 82C37A to address the control register to be loaded or read. In the Active

cycle, they are outputs and provide the lower 4-bits of the output address.

A4-A7: ADDRESS: The four most significant address lines are three-state outputs and provide 4-bits of address.

These lines are enabled only during the DMA service.

HRQ: HOLD REQUEST: The Hold Request (HRQ) output is used to request control of the system bus. When a

DREQ occurs and the corresponding mask bit is clear, or a software DMA request is made, the 82C37A issues

HRQ. The HLDA signal then informs the controller when access to the system busses is permitted. For stand-

alone operation where the 82C37A always controls the busses, HRQ may be tied to HLDA. This will result in

one S0 state before the transfer.

DACK0-DACK3: DMA ACKNOWLEDGE: DMA acknowledge is used to notify the individual peripherals

when one has been granted a DMA cycle. The sense of these lines is programmable. RESET initializes them to

active low.

AEN: ADDRESS ENABLE: Address Enable enables the 8-bit latch containing the upper 8 address bits onto the

system address bus. AEN can also be used to disable other system bus drivers during DMA transfers. AEN is

active high.

ADSTB: ADDRESS STROBE: This is an active high signal used to control latching of the upper address byte. It

will drive directly the strobe input of external transparent octal latches, such as the 82C82. During block



operations, ADSTB will only be issued when the upper address byte must be updated, thus speeding operation

through elimination of S1 states. ADSTB timing is referenced to the falling edge of the 82C37A clock.

MEMR: MEMORY READ: The Memory Read signal is an active low three-state output used to access data

from the selected memory location during a DMA Read or a memory-to-memory transfer.

MEMW MEMORY WRITE: The Memory Write signal is an active low three-state output used to write data to

the selected memory location during a DMA Write or a memory-to-memory transfer.

NC: NO CONNECT: Pin 5 is open and should not be tested for continuity.

Functional Description

The 82C37A direct memory access controller is designed to improve the data transfer rate in systems which must

transfer data from an I/O device to memory, or move a block of memory to an I/O device. It will also perform

memory-to-memory block moves, or fill a block of memory with data from a single location. Operating modes

are provided to handle single byte transfers as well as discontinuous data streams, which allows the 82C37A to

control data movement with software transparency. The DMA controller is a state-driven address and control

signal generator, which permits data to be transferred directly from an I/O device to memory or vice versa

without ever being stored in a temporary register. This can greatly increase the data transfer rate for sequential

operations, compared with processor move or repeated string instructions. Memory-to-memory operations

require temporary internal storage of the data byte between generation of the source and destination addresses, so

memory-to-memory transfers take place at less than half the rate of I/O operations, but still much faster than with

central processor techniques. The block diagram of the 82C37A is shown in Fig.16.6. The timing and control

block, priority block, and internal registers are the main components. The timing and control block derives

internal timing from clock input, and generates external control signals. The Priority Encoder block resolves

priority contention between DMA channels requesting service simultaneously.

Memory Allocation To Program Segments and Blocks Functions, Processes, Data and Stacks at the Various Segments of Memory Segment wise memory allocation in four segments; Code, Data, Stack and Extra (for examples, image, String)



Different Data Structures at the Various Memory Blocks

1) Stacks – Return addresses on the nested calls, Sets of LIFO (Last In First Out) retrievable data, Saved

Contexts of the tasks as the stacks

2) Arrays – One dimensional or multidimensional

3) Queues – Sets of FIFO (First In First Out) retrievable data; Circular Queue (Example- a Printer Buffer);

Block Queue (Example- a network stack)

4) Table

5) Look up Table – Look-up-table row first column points to another memory block of a data structure data

6) List: In a list element, a data structure of an item also points to the next item

7) Process Control Block



Fig. Different structure of stack at memory blocks

Each stack pointer to the top of the stack to where the processor can read and write . A data word always

retrieves in LIFO mode from stack

Below fig:

(a) An array at a memory block with one pointer for its base, first element with index=0.

Data word can retrieve from any element by defining pointer index.

(b) A queue at a memory block with two pointers to point to its two elements at the front and back. A

data word always retrieves in FIFO mode from a Queue.

(c) A circular Queue at a memory block with two pointers points front and back

(d) A memory block for a pipe with front and back points at two different tasks





MEMORY MAP

Map to show the program and data allocation of the addresses to ROM, RAM, EEPROM or Flash in the system

PRINCETON ARCHITECTURE • 80x86 processors and ARM7 have Princeton architecture for main memory.

• 8051-family microcontrollers have Harvard architecture.). Vectors and pointers, variables, program

segments and memory blocks for data and stacks have different addresses in the program in Princeton

memory architecture.

HARVARD ARCHITECTURE

• When the address spaces for the data and for program are distinct

• Handling streams of data that are required to be accessed in cases of single instruction multiple data type

instructions and DSP instructions.

• Separate data buses ensure simultaneous accesses for instructions and data. Program segments and

memory blocks for data and stacks have separate set of addresses in Harvard architecture.

• Control signals and read-write instructions are also read-write instructions are also separate for accessing

the program memory and data memory.



Harvard and Princeton Memory OrganizationsHarvard and Princeton Memory OrganizationsHarvard and Princeton Memory OrganizationsHarvard and Princeton Memory Organizations



Memory map for an exemplary embedded system, smart card needing 2 kB memory

Memory map for an exemplary Java embedded card with software for encrypting and deciphering the transactions



Memory map sections in a smart card

Memory map sections in another smart card



INTERFACING PROCESSOR, MEMORIES AND I/O DEVICES REAL WORLD INTERFACINGREAL WORLD INTERFACINGREAL WORLD INTERFACINGREAL WORLD INTERFACING

Interfacing Using System BusInterfacing Using System BusInterfacing Using System BusInterfacing Using System Bus

Interfacing of processor, memory and IO devices Interfacing of processor, memory and IO devices Interfacing of processor, memory and IO devices Interfacing of processor, memory and IO devices using memory system bususing memory system bususing memory system bususing memory system bus � System bus ─ interconnections for a simple bus structure has three sets of signals

� System bus ─ defines by address bus, data bus, and control bus

� A system-bus interfacing-design is according to the timing diagrams of processor signals, speed, and

word length for instructions and data.

Processor internal bus(es) and external bus(es).

Characteristics differ in the system

Interconnections for a simple bus structure

address bus � Processor issues the address of the instruction byte or word to memory system through the address bus.

� Processor execution unit, when required, issues the address of data (byte or word) to

be read or written using the memory system through address bus.

� The address bus of 32-bits used to fetch the instruction or data from an address specified by 32-bit

number.



EXAMPLE

• Let a processor at the start reset the program counter at address 0. Then the processor issues address 0 on the

bus and the instruction at address 0 is fetched from memory on reset

• Let a processor instruction be such that it needs to load register r1 from the memory address M. The processor

issues address M on the address bus and data at address M is fetched.

Data BusData BusData BusData Bus

• Instruction fetch─ Processor issues the address of the instruction, it gets back the instruction through the data

bus.

• Data Read─ When it issues the address of the data, it loads the data through data bus.

• Data Write─ When it issues the address of the data, it stores the data in the memory through the data bus. A

data bus of 32-bits fetches, loads, or stores the instruction or data of 32-bits.

EXAMPLE

_ Processor issues address m for an instruction, it fetches the instruction through data bus from address m. [For a

32-bit instruction, word at data bus from addresses m, m + 1, m + 2, and m + 3.]

_ Instruction executes for store of register r1 bits to the memory address M, the processor issues address M on

the bus and sends the data at address M through the data bus. [For 32-bit data, word at data bus sent to the

memory addresses M, M + 1, M + 2, and M + 3.]

Control BusControl BusControl BusControl Bus

� Issues signals to control the timing of various actions during interconnection.

� Signals synchronize all the subsystems.

� Address latch enable (ALE)[ Address Strobe (AS) or address valid, (ADV)],

� Memory ‘read’ (RD) or ‘write’ (WR) or IO ‘read’ (IORD) or ‘write,’(IOWR) or ‘data valid’(DAV)

� Other control signals as per the processor design.

Interrupts and DMA Control Signals

� Interrupt acknowledge (INTA) [on a request for drawing the processor attention to an event]

� INT (Interrupt) from external device interrupt to the system

� Hold acknowledge (HLDA) [on an external hold request for permitting use of the system buses]



� HOLD when external device sends a hold request for direct memory access (DMA).

EXAMPLE

• Processor issues the address, it also issues a memory-read control signal and waits for the data or

instruction.

• Memory unit must place the instruction or data during the interval in which memory read signal is active

(not inactivated by the processor)

• Processor issues the address on the address bus, and (after allowing sufficient time for the all address bits

setup) it places the data on the data bus, it also then issues memory-write control signal (after allowing

sufficient time for the all data bits setup) for store signal to memory.

• Memory unit must write (store) the data during the interval in which memory-write signal is active

(not inactivated by the processor).

Program memory access and data buses multiplexed for memory access in Harvard ArchitectureHarvard ArchitectureHarvard ArchitectureHarvard Architecture

� Address and data buses are multiplexed

� Control signal PSEN active when accessing program memory using the address and data buses

� Control signal Read or Write active when accessing data memory using the address and data buses

Time division multiplexed (TDM) address and data bits for the memories

� TDM ─ Different time slots, there are is a different set sets (channel) of the signals. Address signals

during one time slot t. and data bus signals in another time slot.

� Interfacing circuit for the demultiplexing of the buses uses a control signal in such systems. Time

division multiplexed (TDM) address and data bits for the memories

� Control signal Address Latch Enable (ALE) in 8051, Address Strobe (AS) in 68HC11 and address valid

(ADV) in 80196.

� ALE or AS or ADV demultiplexes the address and data buses to the devices Interfacing circuit using

Latch and decoders,

� ALE for latching the address

� PSEN for program memory read using address data buses

� Each chip of the memory or port that connects the processor has a separate chip select input from a

decoder.

� Decoder is a circuit, which has appropriate signals of the address bus at the input and control circuit

� signals to generate corresponding CS (chip select) control signals for each device (memory and

� ports)



InterfacingInterfacingInterfacingInterfacing---- circuitcircuitcircuitcircuit

� Consists of latches, decoders and demultiplexers

� Designed as per available control signals and timing diagrams of the bus signals.

� Circuit connects all the units, processor, memory and the IO device through the system buses.

� Also called glue circuit used as it joins the devices and memory with the system bus and processor

� Can be designed using a GAL (generic array logic) or FPGA



2. Interfacing Using System and Interfacing Using System and Interfacing Using System and Interfacing Using System and Io BusesBusesBusesBuses System Bus and IO Bus

System bus interconnects

� Processor

� memory systems and

� subsystems

� Another set of signals called I/O bus

� Interfacing of processor with system bus at first level and IO bus at second level

Popular IO buses and wireless communication

� PCI Bus interfaces to devices designed to meet the PCI standard.

� USB interfaces to devices designed to meet the USB IOs

� PCI Bus interfaces to devices designed to meet the PCI standard.

� USB interfaces to devices designed to meet the USB IOs Memory system bus and I/O bus

interconnections in a bus structure



3. Multilevel Buses3. Multilevel Buses3. Multilevel Buses3. Multilevel Buses

4. Addresses of Ports and Devices in4. Addresses of Ports and Devices in4. Addresses of Ports and Devices in4. Addresses of Ports and Devices in Real World InterfacingReal World InterfacingReal World InterfacingReal World Interfacing

Device Control Register, Status Register, Receive Buffer, Transmit Buffer

� Each I/O device is at a distinct address or set of addresses

� Each device has three sets of registers ─data buffer register(s), control

register(s) and status register

Device Addresses

� Device control and status addresses and port address remains constant and are not

re-locatable in a program as the glue circuit (hardware) to accesses these is fixed

during the circuit design.



� There can be common addresses for input and output buffers, for example SBUF in 8051

The processor, memory, devices Glue Circuit

The processor, memory and devices are interfaced (glued) together using a programmable

circuit like GAL or

FPGA. The circuit consists of the address decoders as per the memory and device addresses

allocated and the

needed latches multiplexers/ demultiplexers

Device Addresses

� There may be common addresses for control and status bits

� There can be a control bits, which changes the function of a register at a device

address

Example

• Serial line device addresses of device registers

• Fixed by its hardware configuration of UART port interface circuit in a of a system

employing 80x86 processor.

• 0x2F8 to 0x2FE at COM2 COM1 in IBM PC

Feature of UART serial line device in PC

� Two I/O data buffer registers (one for receiving and other for transmitting) at a

common address, 0x2F8

� Data of two bytes of Divisor Latch are at the distinct addresses, 0x2F8 (LSB) and

0x2F9 (MSB)

� Three Control Registers of the device are at three distinct addresses 0x2F90x2FA, 0x2FB and

0x2FC

� Three Status Registers of the device are at three distinct addresses 0x2FA, 0x2FD

and 0x2FE

Device Addresses

• Processor accesses device registers and buffer registers from allocated addresses for the Ports and

Devices



5. Memory Mapped IO to ports and5. Memory Mapped IO to ports and5. Memory Mapped IO to ports and5. Memory Mapped IO to ports and DeviceDeviceDeviceDevice

Memory mapped IO or device Access�Processor access to device is as if to a memory address

Interfacing Processor with Memory Mapped IO

� No separate I/O address space exists for the ports and devices.

� Instructions as well as control signals for operations on bytes at the memory, IO

port and device addresses are same.

� No separate input-output and memory load-store instructions.

� Arithmetic, logical and bit manipulation instructions that are available for data in

memory, are also available for the IO operations.

� Enables direct manipulation of the data taken from the IO port or device.

� Directly manipulate the data stored at the IO port or device.

� All the arithmetic, logical and bit manipulation instructions that are available for

data in memory, can be done using an accumulator or any other register or any other

memory address, where the IO port byte is transferred after or during or before the

arithmetic or logical operation.

Memory mapped IOs Example

8051 microcontroller devices have the addresses for processor-accesses that are not

distinct from the memory and are accessed with same set of instructions and control

signals RD and WR

Processor and memory organization with I/O devices memory assignments in the

68HC11 (having memory mapped IO architecture)

Memory and Port addresses in 68HC11



6. IO address Mapped IO port or Device6. IO address Mapped IO port or Device6. IO address Mapped IO port or Device6. IO address Mapped IO port or Device AccessAccessAccessAccess

IO mapped IO

� Processor access to device is by distinct instruction and control signals

� Memory address is accessed by Load and Store instructions and IO device address is

accessed by distinct set of instructions OUT and IN and distinct set of control

signals (IOWR and IORD)

IO mapped IOs Example

80x86 processor accesses the external devices using the addresses in space, which is

distinct from the memory

Features of IO addresses mapped IOs

� Separate I/O address space than for the ports and devices.

� Instructions and control signals for operations on bytes at the memory and IO port

and devices are distinct.

� Advantage of simplicity.



� IO devices and port addresses are interfaced independently of memory without

considering the

memory addresses that are assigned for software and data.

� Processor separate input-output (for read and write) instructions and memory load

store (for read and write) instructions.

� All the arithmetic, logical and bit manipulation instructions that are available for

data in memory, can be done using an accumulator, where the IO port byte is

transferred before an arithmetic or logical operation

Device Addresses in 80x86 based PCDevice Addresses in 80x86 based PCDevice Addresses in 80x86 based PCDevice Addresses in 80x86 based PC

7. Interrupts and IOs7. Interrupts and IOs7. Interrupts and IOs7. Interrupts and IOs Interrupt driven IOInterrupt driven IOInterrupt driven IOInterrupt driven IO

Processor access to device is by executing an ISR on a device-interrupt, for example, interrupt on

timer overflow, keyboard data ready or transmit-data buffer empty Used when the processor needs to

perform a prolong data transfer operations using a I/O device and wants to be able to do other work while

waiting for the transfer operations to complete.



IO device function slow as compared to processor.

� Interrupt driven IO can be used in those cases.

� Interrupt is the mechanism used by most processors to handle asynchronous type of events

Interrupt allows a device to request that the processor to stop what it is currently doing and

execute software called interrupt service routine to process the device's request, much like a

procedure call. Here the call is initiated by an event at the external or internal device rather than by

the program instruction running on the processor

Keyboard Keyboard Keyboard Keyboard ExampleExampleExampleExample

� Takes about 10 ms to send the code for the key and maximum 10 keys can be pressed in 1 s

� When does a key input event occurs is not fixed.

� Intervals between two events of successive key inputs are not fixed.

� Interrupt driven mode, when a key is pressed, an interrupt signal RxRDY (receiver data ready) to

the processing unit causes the execution of a service routine and the service routine program reads

the byte for code.

Keyboard interruptKeyboard interruptKeyboard interruptKeyboard interrupt

Printer examplePrinter examplePrinter examplePrinter example

� Maximum 300 characters can print in 1 s,

� 0.3 ms to print the code sent at the output by a port.

� When does a print operation is complete for a character is not fixed.



� Intervals between two events of successive print of the characters are not fixed.

� Interrupt driven mode, when a print action completes, an interrupt signal TxDE (transmission data

empty) to the printer processing-unit (print controller) will cause the execution of a service routine

and the service routine will send another byte as output.

Printer interruptPrinter interruptPrinter interruptPrinter interrupt

Bus Arbitration Mechanisms

1. Bus Sharing by Multiple Processors or Controllers

Bus Arbitration Requirement

� Several processor and several single purpose processors sharing a bus.*

� Bus can be granted to one processor at an instance

*[A single purpose processor is also called controller. A controller can be part of a device or peripherals]



System buses shared between the controllers and an IO processor and multiple controllers that have to

access the bus, but only one of them can be granted the bus master status at any one instance

Bus Arbitration Mechanism

� System buses are shared between the controllers and an IO processor and multiple controllers that

have to access the bus, but only one of them can be granted the bus master status at any one

instance

� Bus master has the access to the bus at an instanceheral or peripherals

Bus arbitration process A process by which the current bus master accesses the bus and then leaves the control of bus and passes

it to another bus requesting processor unit.

Three methods in bus arbitration process.

• Daisy Chain method, • Independent Bus Requests and Grant method, • Polling method



Daisy Chaining for Bus Sharing by Multiple Processors or controllers

Daisy chaining method

� Centralized bus arbitration process. � Bus control passes from one bus master to the next one, then to the next and so on. � Bus control passes from controller units C0 to C1, then to C2, then U3, and so on.

Sequence of Signals in the arbitration process • Bus-grant signal (BG) which functions like a token, is first sent to C0.

• If C0 does not need the bus, it passes BG to C1.

• A controller needing the bus raises a bus request (BR) signal.

• A bus-busy (BUSY) signal generates when that controller becomes the bus master.

Signals in the arbitration process

• When bus master no longer needs the bus, it deactivates BR and BUSY signal also deactivates.

• Another BG is issued and passed from C0 to down the priority controllers one by one [For example,

COM2 to COM1 in IBM PC]

Daisy method advantage � At each instance of bus access the i-th controller gets the highest priority to bus compared to (i + 1)th.

� Controllers and processors priorities for granting the bus access (bus master status) fixed



Independent request and grant method for Bus Sharing by Multiple Processors orIndependent request and grant method for Bus Sharing by Multiple Processors orIndependent request and grant method for Bus Sharing by Multiple Processors orIndependent request and grant method for Bus Sharing by Multiple Processors or ControllersControllersControllersControllers

Independent bus request method • Controller separate BR signals, BR0, BR1, …, BRn.

• Separate BG signals, BG0, BG1, …, BGn for the controllers.

• An ith controller sends BRi (i-th bus request signal) and when it receives BGi (ith

• bus grant signal), it uses the bus and then BUSY signal activates Any controller, which finds active

BUSY,does not send BR from it.

• Independent bus request method advantage is that the i-th controller can be programmed to get

the highest priority to the bus and the priority of a controller can be programmed dynamically

PollinPollinPollinPolling method for Bus Sharing by Multiple Processors or controllersg method for Bus Sharing by Multiple Processors or controllersg method for Bus Sharing by Multiple Processors or controllersg method for Bus Sharing by Multiple Processors or controllers Polling the Requesting Device Method

• A poll counts value is sent to the controllers and is incremented. Assume that there are 8

controllers. Three poll count signals p2, p1, p0 successively change from 000, 001, …, 110, 111, 000,

… If on count = i, a BR signal is received then counts increment stops, BG is sent.

.



• Then BUSY activates when that controller becomes the bus master. When BR deactivates then BG

and BUSY also deactivates and counts increment starts.

Polling method advantage is that the controller next to the current bus master gets the highest priority

to the access the bus after the current bus master finishes the operations through the bus

Interfacing examples with keyboard, displays, D/A and A/D Conversions

KeyboardKeyboardKeyboardKeyboard Two signals KBINT and TxD from a keyboard controller

• KBINT is interrupt from keyboard controller.

• TxD is serial UART data output of controller connected to RxD at SI in 8051 or UART Intel 8250 or UART

16550, which includes a 16-byte buffer

Debouncer

• Bounces create on pressing - Each bounce creates a false pulse.

• Keyboard controller has hardware debouncer to the care of bouncing of a key.

Scan Clock

• Keyboard controller has counter driven by a scan clock, which continuously increments at certain rate

and scans each key whether that is in pressed or released state.



Keyboard Interface to Serial Interface at Microcontroller

Encoder

To encode the keyboard output for a ROM.

• ROM generates the ASCII code output for the pressed key.

• The code accounts the multiple keys simultaneously pressed.

• Example, Shift key is also pressed then generate the code for upper case character.

TxD

• The code bits are serially transferred as TxD Output

LCD DISPLAY CONTROLLER

LCD Controller Interface

� 3 bits for E, RS and R/W

� 8 output data.

� One 8-bit port is used for output data for display.

� Another port is used for 3 bits



DAC DAC using PWM and integratorDAC using PWM and integratorDAC using PWM and integratorDAC using PWM and integrator

DAC - PWM circuit and an integrator.

• PWM ─ internal device in a microcontroller

• A pulse width register (PWR) is programmed according to a required analog output.

PWM Functioning

A counter/timer device, which generates two internal- interrupts one on timer overflow and another after

an interval proportional to equal to PWR.

• On first interrupt, the output becomes 1 and on second interrupt it becomes 0.

Integrator

• Generates the analog output as per the period of output = 1 (period between first and interrupts)

compared to total period of output pulses (period between successive first interrupts).



DAC Using a DAC external chipDAC Using a DAC external chipDAC Using a DAC external chipDAC Using a DAC external chip

ADC

ADC Using ADC external chip



Start of conversion pulse generator circuit,

• A sample hold amplifier circuit to hold the signal constant for the conversion period and signal

conditioner

• Voltage references + and – for providing the reference for conversion of analog input

n-bit ADC

• A four or eight channel ADC is inbuilt in microcontrollers or an external ADC for example,

ADC0808

• Interfacing similar to that to the ports.

=================================================================================

CASE STUDY

� Automatic Washing machine

� Chocolate vending machine



� Digital Camera and Voice recorder

Automatic Chocolate Vending Machine (ACVM)



ACVM � Coin insertion slot

� Keypad on the top of the machine.

� LCD display unit on the top of the machine. It displays menus, text entered into the ACVM and

pictograms, welcome, thank and other messages.

� Graphic interactions with the machine.

� Displays time and date.

� the chocolate and coins, if refunded.

� Internet connection port so that owner can know status of the ACVM sales from remote.

ACVM Hardware units � Microcontroller or ASIP (Application Specific Instruction Set Processor)

� RAM for storing temporary variables and stack

� ROM for application codes and RTOS codes for scheduling the tasks

� Flash memory for storing user preferences, contact data, user address, user date of birth, user

identification code, answers of FAQs

� Timer and Interrupt controller

� A TCP/IP port (Internet broadband connection) to the ACVM for remote control and for getting

ACVM status reports by owner.

� ACVM specific hardware

� Power supply.

ACVM Software components _ Keypad input read

_ Display

_ Read coins

_ Deliver chocolate

_ TCP/IP stack processing

_ TCP/IP stack communication

Digital CameraDigital CameraDigital CameraDigital Camera



A typical Camera � 4 M pixel/6 M pixel still images, clear visual display (ClearVid) CMOS sensor, 7 cm wide LCD photo

display screen, enhanced imaging processor, double anti blur solution and high-speed processing

engine, 10X optical and 20X digital zooms

� Record high definition video-clips. It therefore has speaker microphone(s) for high quality

recorded sound.

� Audio/video Out Port for connecting to a TV/DVD player.

Arrangements � Keys on the camera.

� Shutter, lens and charge coupled device (CCD) array sensors

� Good resolution photo quality LCD display unit

� Displays text such as image-title, shooting data and time and serial number. It displays messages. It

displays the GUI menu when user interacts with the camera.

� Self-timer lamp for flash.



Internal units � Internal memory flash to store OS and embedded software and limited number of image files

� Flash memory stick of 2 GB or more for large storage.

� Universal Serial Bus (USB), Bluetooth and serial COM port for connecting it to computer, mobile

and printer.

� LCD screen to display frame view.

� Saved images display using the navigation keys.

� Frame light falls on the CCD array, which through an ADC transmits the bits for each pixel in each

row in the frame and for the dark area pixels in each row for offset correction in CCD signaled light

intensities for each row.

� The CCD bits of each pixel in each row and column are offset corrected by CCD signal processor

(CCDSP).

ASIP and Single purpose processors � For Signals compression using a JPEG CODEC and saved in one jpg file for each frame.

� For DSP for compression using the discrete cosine transformations (DCTs) and decompression.

� For DCT Huffman coding for the JPEG compression.

� For decompression by inverse DCT before the DAC sends input for display unit through pixel

processor.

� Pixel processor (for example, image contrast, brightness, rotation, translation, color adjustment)

Digital Camera Hardware units � Microcontroller or ASIP (Application Specific Instruction Set Processor)

� Multiple processors (CCDSP, DSP, Pixel Processor and others)

� RAM for storing temporary variables and stack

� ROM for application codes and RTOS codes for scheduling the tasks

� Timer, Flash memory for storing user preferences, contact data, user address, user date of birth, user

identification code, ADC, DAC and Interrupt controller

� The DAC gets the input from pixel processor, which gets the inputs from JPEG file for the saved images

and also gets input directly from the CCDSP through pixel processor or the frame in present view

� USB controller Direct Memory Access controller

� LCD controller

� Battery and external charging circuit

Digital Camera Software components _ CCD signal processing for off-set correction



_ JPEG coding

_ JPEG decoding

_ Pixel processing before display

_ Memory and file systems

_ Light, flash and display device drivers

_ LCD, USB and Bluetooth Port device- drivers for port operations for display, printer and computer

communication control

Digital camera software

components

Date post:	08-Mar-2015
Category:	Documents
Upload:	sujith
View:	1,569 times
Download:	0 times

Unit 2_embedded system

Documents