Computer Organization

R 402 Computer Organization

Computer Science & Engineering Dept. SJCET, Palai



COMPUTER ORGANIZATION

R 402 2+1+0

Module 1

Introduction: Organization and Architecture – Review of basic operational

concepts – CPU- single bus and two bus organization, Execution of a complete

instruction – interconnection structures – layered view of a computer system.

Module 2

CPU - Arithmetic: Signed addition and subtraction – serial and parallel adder –

BCD adder – Carry look ahead adder, Multiplication – Array multiplier – Booth‘s

Algorithm, Division – Restoring and non-restoring division, floating point

arithmetic - ALU Design.

Module 3

Control Unit Organization: Processor Logic Design – Processor Organization –

Control Logic Design – Control Organization – Hardwared control –

Microprogram control – PLA control – Microprogram sequencer, Horizontal and

vertical micro instructions – Nano instructions.

Module 4

Memory: Memory hierarchy – RAM and ROM – Memory system considerations

– Associative memory, Virtual memory – Cache memory – Memory interleaving.

Module 5

Input – Output: Printers, Plotters, Displays, Keyboard, Mouse, OMR and OCR,

Device interface – I/O processor – Standard I/O interfaces – RS 232 C, IEEE

488.2 (GPIB).

References

1. Computer Organization - Hamacher, Vranesic and Zaky, Mc Graw Hill

2. Digital Logic and Computer Design - Morris Mano, PHI

3. Computer Organization and Architecture -William Stallings, Pearson Education

Asia.

4. Computer Organization and Design - Pal Chaudhuri, PHI

5. Computer Organization and Architecture -M Morris Mano, PHI

6. Computer Architecture and Organization - John P Hayes, Mc Graw Hill



1.1 Introduction to Computer organization and architecture In describing computer system, a distinction is often made between computer architecture

and computer organization.

Computer architecture refers to those attributes of a system visible to a programmer, or

put another way, those attributes that have a direct impact on the logical execution of a

program.

Computer organization refers to the operational units and their interconnection that

realize the architecture specification.

Examples of architecture attributes include the instruction set, the number of bit to

represent various data types (e.g.., numbers, and characters), I/O mechanisms, and

technique for addressing memory.

Examples of organization attributes include those hardware details transparent to the

programmer, such as control signals, interfaces between the computer and peripherals,

and the memory technology used.

As an example, it is an architectural design issue whether a computer will have a multiply

instruction. It is an organizational issue whether that instruction will be implemented by a

special multiply unit or by a mechanism that makes repeated use of the add unit of the

system. The organization decision may be bases on the anticipated frequency of use of

the multiply instruction, the relative speed of the two approaches, and the cost and

physical size of a special multiply unit.

Historically, and still today, the distinction between architecture and organization has

been an important one. Many computer manufacturers offer a family of computer model,

all with the same architecture but with differences in organization. Consequently, the

different models in the family have different price and performance characteristics.

Furthermore, an architecture may survive many years, but its organization changes with

changing technology.

Basic Structure of a Computer

Figure 1 shows the general structure of the IAS computer. It consists of:

A main memory, which stores both data and instructions.

An arithmetic-logical unit (ALU) capable of operating on binary data.

A control unit, which interprets the instructions in memory and causes them to be

executed.

Input and output (I/O) equipment operated by the control unit.



Fig.1 Basic structure of a computer.

1.2 Review of basic operational concepts Now we focus on the processing unit, which executes machine instructions and

coordinates the activities of other units. This unit is often called the instruction Set

Processor (ISP), or simply the processor. We examine its internal structure and how it

performs the tasks of fetching, decoding, and executing instructions of a program. The

processing unit used to be called the central processing unit (CPU). The term ―central‖ is

less appropriate today because many modern computer systems include several

processing units.

The organization of processors has evolved over the years, driven by developments in

technology and the need to provide high performance. A common strategy in the

development of high-performance processors is to make various functional units operate

in parallel as much as possible. High-performance processors have a pipelined

organization where the execution of one instruction is started before the execution of the

preceding instruction is completed. In another approach, known as superscalar operation,

several instructions are fetched and executed at the same time. Pipelining and superscalar

architectures are discussed later.

A typical computing task consists of a series of steps specified by a sequence of machine

instructions that constitute a program. An instruction is executed by carrying out a

sequence of more rudimentary operations. These operations and the means by which they

are controlled are the main topic of this chapter.

1.3 CPU- single bus organization To execute a program, the processor fetches one instruction at a time and performs the

operations specified. Instructions are fetched from successive memory locations until a

branch or a jump instruction is encountered. The processor keeps track of the address of

the memory location containing the next instruction to be fetched using the program

counter, PC. After fetching an instruction, the contents of the PC are updated to point to

the next instruction in the sequence. A branch instruction may load a different value into

the PC.

Another key register in the processor is the instruction register, IR. Suppose that

each instruction comprises 4 bytes, and that it is stored in one memory word. To execute

an instruction, the processor has to perform the following three steps:



I. Fetch the contents of the memory location pointed to by the PC. The contents of this

location are interpreted as an instruction to be executed. Hence, they are loaded into the

IR. Symbolically, this can be written as

IR [[PC]]

2. Assuming that the memory is byte addressable, increment the contents of the PC by 4,

that is,

PC [PC] +4

3. Carry out the actions specified by the instruction in the IR.

In cases where an instruction occupies more than one word, steps I and 2 must be

repeated as many times as necessary to fetch the complete instruction. These two steps

are usually referred to as the fetch phase; step 3 constitutes the execution phase.

To study these operations in detail, we first need to examine the internal organization

of the processor. They can be organized and interconnected in a variety of ways. We will

start with a very simple organization. Later in this chapter and in Chapter 8 we will

present more

complex structures that provide high performance. Figure 1.1 shows an organization in

which the arithmetic and logic unit (ALU) and all the registers are interconnected via a

single common bus. This bus is internal to the processor and should not be confused with

the external bus that connects the processor to the memory and 110 devices.

The data and address lines of the external memory bus are shown in Figure 1.1 connected

to the internal processor bus via the memory data register, MDR, and the memory address

register, MAR, respectively. Register MDR has two inputs and two outputs. Data may be

loaded into MDR either from the memory bus or from the internal processor bus. The

data stored in MDR may be placed on either bus. The input of MAR is connected to the

internal bus, and its output is connected to the external bus. The control lines of the

memory bus are connected to the instruction decoder and control logic block. This unit is

responsible for issuing the signals that control the operation of all the units inside the

processor and for interacting with the memory bus.

The number and use of the processor registers R0 through R(n - 1) vary considerably

from one processor to another. Registers may be provided for general-purpose use by the

programmer. Some may be dedicated as special-purpose registers, such as index registers

or stack pointers. Three registers, Y, Z, and TEMP in Figure 1.1, have not been

mentioned before. These registers are transparent to the programmer, that is, the

programmer need not be concerned with them because they are never referenced

explicitly by any instruction. They are used by the processor for temporary storage during

execution of some instructions. These registers are never used for storing data generated

by one instruction for later use by another instruction.

The multiplexer MUX selects either the output of register Y or a constant value 4 to be

provided as input A of the ALU. The constant 4 is used to increment the contents of the

program counter. We will refer to the two possible values of the MUX control input

Select as Select4 and SelectY for selecting the constant 4 or register Y, respectively.

As instruction execution progresses, data are transferred from one register to another,

often passing through the ALU to perform some arithmetic or logic operation. The

instruction decoder and control logic unit is responsible for implementing the actions

specified by the instruction loaded in the JR register. The decoder generates the control



signals needed to select the registers involved and direct the transfer of data. The

registers, the ALU, and the interconnecting bus are collectively referred to as the

datapath.

With few exceptions, an instruction can be executed by performing one or more of

the following operations in some specified sequence:

• Transfer a word of data from one processor register to another or to the ALU

• Perform an arithmetic or a logic operation and store the result in a processor register

• Fetch the contents of a given memory location and load them into a processor register

• Store a word of data from a processor register into a given memory location

We now consider in detail how each of these operations is implemented, using the

simple processor model in Figure 1.1.

Fig.1.1 Single bus organization of the data path inside a processor

1. 3 .1 REGISTER TRANSFERS



Instruction execution involves a sequence of steps in which data are transferred from one register

to another. For each register, two control signals are used to place the contents of that register on

the bus or to load the data on the bus into the register. This is represented symbolically in Figure

1.2. The input and output of register Ri are connected to the bus via switches controlled by the

signals Riin and Riout, respectively. When Riin is set to 1, the data on the bus are loaded into Ri.

Similarly, when Riout is

set to 1, the contents of register Ri are placed on the bus. While Riout is equal to 0, the

bus can be used for transferring data from other registers.

Suppose that we wish to transfer the contents of register Rl to register R4. This can be

accomplished as follows:

Enable the output of register Ri by setting Riout to 1. This places the contents of R 1 on the

processor bus.

Enable the input of register R4 by setting R4in to 1. This loads data from the processor

bus into register R4.

All operations and data transfers within the processor take place within time periods

defined by the processor clock. The control signals that govern a particular transfer are

asserted at the start of the clock cycle. In our example, R1out and R4in are set to 1.

The registers consist of edge-triggered flip-flops. Hence, at the next active edge of the

clock, the flip-flops that constitute R4 will load the data present at their inputs. At the

same time, the control signals R1out and R4in will return to 0. We will use this simple

model of the timing of data transfers for the rest of this chapter. However, we should

point out that other schemes are possible. For example, data transfers may use both the

rising and falling edges of the clock. Also, when edge-triggered flip-flops are not used,

two or more clock signals may be needed to guarantee proper transfer of data. This is

known as multiphase clocking.

An implementation for one bit of register Ri is shown in Figure 1.3 as an example. A

two-input multiplexer is used to select the data applied to the input of an edge-triggered

D flip-flop. When the control input Ri1 is equal to 1, the multiplexer selects the data

on the bus. This data will be loaded into the flip-flop at the rising edge of the clock.

When Ri is equal to 0, the multiplexer feeds back the value currently stored in the

flip-flop.



Fig.1.2 Input and output gating for the registers in fig 1.1



Fig. 1.3 Input and output gating for one register bit

The Q output of the flip-flop is connected to the bus via a tri-state gate. When Riout

is equal to 0, the gate‘s output is in the high-impedance (electrically disconnected) state.

This corresponds to the open-circuit state of a switch. When Riout = 1, the gate drives

the bus to 0 or I, depending on the value of Q.

1.3.2 PERFORMING AN ARITHMETIC OR LOGIC OPERATION

The ALU is a combinational circuit that has no internal storage. It performs arithmetic

and logic operations on the two operands applied to its A and B inputs. In Figures 1.1

and 1.2, one of the operands is the output of the multiplexer MUX and the other operand is

obtained directly from the bus. The result produced by the ALU is stored temporarily in

register Z. Therefore, a sequence of operations to add the contents of register Ri to those

of register R2 and store the result in register R3 is

1. R1out, Yin

2. SelectY, Add, Z

3. Z0, R31

The signals whose names are given in any step are activated for the duration of the clock

cycle corresponding to that step. All other signals are inactive. Hence, in step 1, the

output of register Rl and the input of register Y are enabled, causing the contents of RI to

be transferred over the bus to Y. In step 2, the multiplexer‘s Select signal is set to SelectY

causing the multiplexer to gate the contents of register Y to input A of the ALU. At the

same time, the contents of register R2 are gated onto the bus and, hence, to input B. The

function performed by the ALU depends on the signals applied to its control lines. In this

case, the Add line is set to 1, causing the output of the ALU to be the sum of the two

numbers at inputs A and B. This sum is loaded into register Z because its input control

signal is activated. In step 3, the contents of register Z are transferred to the destination

register, R3. This last transfer cannot be carried out during step 2, because only one

register output can be connected to the bus during any clock cycle.

In this introductory discussion, we assume that there is a dedicated signal for each

function to be performed. For example, we assume that there are separate control signals



to specify individual ALU operations, such as Add, Subtract, XOR, and so on. In reality,

some degree of encoding is likely to be used. For example, if the ALU can perform eight

different operations, three control signals would suffice to specify the required operation.

1.3.3 FETCHING A WORD FROM MEMORY

To fetch a word of information from memory, the processor has to specify the address of

the memory location where this information is stored and request a Read operation. This

applies whether the information to be fetched represents an instruction in a program or an

operand specified by an instruction. The processor transfers the required address to the

MAR, whose output is connected to the address lines of the memory bus. At the same

time, the processor uses the control lines of the memory bus to indicate that a Read

operation is needed. When the requested data are received from the memory they are

stored in register MDR, from where they can be transferred to other registers in the

processor.

The connections for register MDR are illustrated in Figure 1.4. It has four control signals:

MDRin and MDRout control the connection to the internal bus, and MDRIUE and

MDRoutE control the connection to the external bus. The circuit in Figure 1.3 is easily

modified to provide the additional connections. A three-input multiplexer can be used,

with the memory bus data line connected to the third input. This input is selected when

MDRinE = 1. A second tri-state gate, controlled by MDROUtE can be used to connect the

output of the flip-flop to the memory bus.

During memory Read and Write operations, the timing of internal processor operations

must be coordinated with the response of the addressed device on the memory bus. The

processor completes one internal data transfer in one clock cycle. The speed of operation

of the addressed device, on the other hand, varies with the device. We saw in Chapter 5

that modern processors include a cache memory on the same chip as the processor.

Typically, a cache will respond to a memory read request in one clock cycle. However,

when a cache miss occurs, the request is forwarded to the main memory, which

introduces a delay of several clock cycles. A read or write request may also be intended

for a register in a memory-mapped I/O device.



Fig 1.4 Connection and control signals for register MDR

Such I/O registers are not cached, so their accesses always take a number of clock cycles.

To accommodate the variability in response time, the processor waits until it receive an

indication that the requested Read operation has been completed. We will assume that a

control signal called Memory-Function-Completed (MFC) is used for this purpose. The

addressed device sets this signal to 1 to indicate that the contents of that specified

location have been read and are available on the data lines of the memory the bus.

As an example of a read operation, consider the instruction Move (Rl),R2. The

actions needed to execute this instruction are:

These actions may be carried out as separate steps, but some can be combined into

a single step. Each action can be completed in one clock cycle, except action 3 which

requires one or more clock cycles, depending on the speed of the addressed device.

For simplicity, let us assume that the output of MAR is enabled all the time. Thus, the

contents of MAR are always available on the address lines of the memory bus. This ad or

is the case when the processor is the bus master. When a new address is loaded into

MAR, it will appear on the memory bus at the beginning of the next clock cycle, as

shown in Figure 1.5. A Read control signal is activated at the same time MAR is loaded.

This signal will cause the bus interface circuit to send a read command, MR, on the bus.

With this arrangement, we have combined actions I and 2 above into a single control

step. Actions 3 and 4 can also be combined by activating control signal MDRinE while

waiting for a response from the memory. Thus, the data received from the memory are

loaded into MDR at the end of the clock cycle in which the MFC signal is received. In

the next clock cycle, MDRout is activated to transfer the data to register R2. This means

that the memory read operation requires three steps, which can be described by the

signals being activated as follows:

where WMFC is the control signal that causes the processor‘s control circuitry to wait

for the arrival of the MFC signal. Figure 1.5 shows that MDRinE is set to 1 for exactly the

same period as the read command, MR. Hence, in subsequent discussion, we will not

specify the value of MDRinE explicitly, with the understanding that it is always equal to

MR.



Fig.1.5 Timing of a memory Read operation.

1.3.4 STORING A WORD IN MEMORY

Writing a word into a memory location follows a similar procedure. The desired address

is loaded into MAR. Then, the data to be written are loaded into MDR, and a Write

command is issued. Hence, executing the instruction Move R2,(Rl) requires the following

sequence:

As in the case of the read operation, the Write control signal causes the memory bus

interface hardware to issue a Write command on the memory bus. The processor remains

in step 3 until the memory operation is completed and an MFC response is received.

1.4 MULTIPLE-BUS ORGANIZATION We used the simple single-bus structure of Figure 1.1 to illustrate the basic ideas. The he

resulting control sequences in Figures 1.6 and 1.7 are quite long because only one data



item can be transferred over the bus in a clock cycle. To reduce the number of steps

needed, most commercial processors provide multiple internal paths that enable several

transfers to take place in parallel.

Fig.1.8 Three-bus organization of the data path.

Figure 1.8 depicts a three-bus structure used to connect the registers and the ALU of a

processor. All general-purpose registers are combined into a single block called the

register file. The register file in Figure 1.8 is said to have three ports. There are two

outputs, allowing the contents of two different registers to be accessed simultaneously

and have their contents placed on buses A and B. The third port allows the data on bus C

to be loaded into a third register during the same clock cycle. Buses A and B are used to

transfer the source operands to the A and B inputs of the ALU, where an arithmetic or



logic operation may be performed. The result is transferred to the destination over bus C.

If needed, the ALU may simply pass one of its two input operands unmodified to bus C.

We will call the ALU control signals for such an operation R=A or R=B. The three-bus

arrangement obviates the need for registers Y and Z in Figure 1.1.

A second feature in Figure 1.8 is the introduction of the Incrementer unit, which is used

to increment the PC by 4. Using the Incrementer eliminates the need to add 4 to the PC

using the main ALU, as was done in Figures 1.6 and 1.7. The source for the constant 4 at

the ALU input multiplexer is still useful. It can be used to increment other addresses,

such as the memory addresses in LoadMultiple and StoreMultiple instructions.

Fig. 1.9 Control sequence for the instruction Add R4,R5,R6 for the three-bus

organization in Fig1.8

Consider the three-operand instruction

Add R4,R5,R6

The control sequence for executing this instruction is given in Figure 1.9. In step 1, the

contents of the PC are passed through the ALU, using the R==B control signal, and

loaded into the MAR to start a memory read operation. At the same time the PC is

incremented by 4. Note that the value loaded into MAR is the original contents of the PC.

The incremented value is loaded into the PC at the end of the clock cycle and will not

affect the contents of MAR. In step 2, the processor waits for MFC and loads the data

received into MDR, then transfers them to IR in step 3. Finally, the execution phase of

the instruction requires only one control step to complete, step 4. By providing more

paths for data transfer a significant reduction in the number of clock cycles needed to

execute an instruction is achieved.

1.5 EXECUTION OF A COMPLETE INSTRUCTION Let us now put together the sequence of elementary operations required to execute one

instruction. Consider the instruction

Add (R3),Rl

which adds the contents of a memory location pointed to by R3 to register Ri. Executing

this instruction requires the following actions:



1. Fetch the instruction.

2. Fetch the first operand (the contents of the memory location pointed to by R3).

3. Perform the addition.

4. Load the result into RI.

Figure 1.6 gives the sequence of control steps required to perform these operations for the

single-bus architecture of Figure 1.1. Instruction execution proceeds as follows. In step 1,

the instruction fetch operation is initiated by loading the contents of the PC into the MAR

and sending a Read request to the memory. The Select signal is set to Select4, which

causes the multiplexer MUX to select the constant 4. This value is added to the operand

at input B, which is the contents of the PC, and the result is stored in register Z. The

updated value is moved from register Z back into the PC during step 2, while waiting for

the memory to respond. In step 3, the word fetched from the memory is loaded into the

IR.

Steps 1 through 3 constitute the instruction fetch phase, which is the same for all

instructions. The instruction decoding circuit interprets the contents of the JR at the

beginning of step 4. This enables the control circuitry to activate the control signals for

steps 4 through 7, which constitute the execution phase. The contents of register R3 are

transferred to the MAR in step 4, and a memory read operation is initiated.

Fig. 1.6. Control signals for the execution of the instruction Add (R3),R1.

Then the contents of Ri are transferred to register Y in step 5, to prepare for the addition

operation. When the Read operation is completed, the memory operand is available in

register MDR, and the addition operation is performed in step 6. The contents of MDR

are gated to the bus, and thus also to the B input of the ALU, and register Y is selected as

the second input to the ALU by choosing SelectY. The sum is stored in register Z, then

transferred to Ri in step 7. The End signal causes a new instruction fetch cycle to begin

by returning to step 1.

This discussion accounts for all control signals in Figure 1.6 except Y1, in step 2. There is

no need to copy the updated contents of PC into register Y when executing the Add

instruction. But, in Branch instructions the updated value of the PC is needed to compute

the Branch target address. To speed up the execution of Branch instructions, this value is

copied into register Y in step 2. Since step 2 is part of the fetch phase, the same action

will be performed for all instructions. This does not cause any harm because register Y is

not used for any other purpose at that time.



Branch Instruction A branch instruction replaces the contents of the PC with the branch target address. This

address is usually obtained by adding an offset X, which is given in the branch

instruction, to the updated value of the PC. Figure 1.7 gives a control sequence that

implements an unconditional branch instruction. Processing starts, as usual, with the fetch

phase. This phase ends when the instruction is loaded into the IR in step 3. The offset

value is extracted from the IR by the instruction decoding circuit, which will also perform

sign extension if required. Since the value of the updated PC is already available in

register Y, the offset X is gated onto the bus in step 4, and an addition operation is

performed. The result, which is the branch target address, is loaded into the PC in step 5.

The offset X used in a branch instruction is usually the difference between the branch

target address and the address immediately following the branch instruction.

Fig.1.7 Control sequence for an unconditional branch instruction.

For example, if the branch instruction is at location 2000 and if the branch target address

is 2050, the value of X must be 46. The reason for this can be readily appreciated from

the control sequence in Figure 1.7. The PC is incremented during the fetch phase, before

knowing the type of instruction being executed. Thus, when the branch address is

computed in step 4, the PC value used is the updated value, which points to the

instruction following the branch instruction in the memory.

Consider now a conditional branch. In this case, we need to check the status of the

condition codes before loading a new value into the PC. For example, for a Branch-on-

negative (Branch<0) instruction, step 4 in Figure 1.7 is replaced with

Thus, if N =0 the processor returns to step 1 immediately after step 4. If N = 1, step 5 is

performed to load a new value into the PC, thus performing the branch operation.

1.6 Interconnection structures

A computer consists of a set of components or modules (processor, memory, I/O) that

communicate with each other. A computer is a network of modules. There must be paths

for connecting these modules. The collection of paths connecting the various modules is

called the interconnection structure.



• Memory

o Consists of N words of equal length

o Each word assigned a unique numerical address (0, 1, …, N-1)

o A word of data can be read or written

o Operation specified by control signals

o Location specified by address signals

• I/O Module

o Similar to memory from computers viewpoint

o Consists of M external device ports (0, 1, …, M-1)

o External data paths for input and output

o Sends interrupt signal to the processor

• Processor



o Reads in instructions and data

o Writes out data after processing

o Uses control signals to control overall operation of the system

o Receives interrupt signals

The preceding list defines the data to be exchanged. The interconnection structure must

support the following types of transfers:

• Memory to processor: processor reads an instruction or a unit of data from memory.

• Processor to memory: processor writes a unit of data to memory.

• I/O to processor: processor reads data from an I/O device via an I/O module.

• Processor to I/O: processor sends data to the I/O device via an I/O module.

• I/O to or from memory: an I/O module is allowed to exchange data directly with

memory, without going through the processor, using direct memory access (DMA).

Over the years, a number of interconnection structures have been tried. By far the most

common is the bus and various multiple-bus structures.

Bus Interconnection A bus is a communication pathway connecting two or more devices. Multiple devices can

be connected to the same bus at the same time. Typically, a bus consists of multiple

communication pathways, or lines. Each line is capable of transmitting signals

representing binary 1 or binary 0. A bus that connects major computer components

(processor, memory, I/O) is called a system bus.

Bus Structure Typically, a bus consists of 50 to hundreds of separate lines. On any bus the lines are

grouped into three main function groups: data, address, and control. There may also be

power distribution lines for attached modules.

• Data lines

o Path for moving data and instructions between modules.

o Collectively are called the data bus.

o Consists of: 8, 16, 32, 64, etc… bits – key factor in overall system performance

• Address lines

o Identifies the source or destination of the data on the data bus. ƒ CPU needs to

read an instruction or data from a given memory location.

o Bus width determines the maximum possible memory capacity for the system.



8080 has 16 bit addresses giving access to 64K address

• Control lines

o Used to control the access to and the use of the data and address lines.

o Transmits command and timing information between modules.

Typical control lines include the following:

• Memory write: causes data on the bus to be written to the addressed memory location.

• Memory read: causes data from the addressed memory location to be placed on the bus.

• I/O write: causes data on the bus to be output to the addressed I/O port.

• I/O read: causes data from the addressed I/O port to be placed on the bus.

• Transfer ACK: indicates that data have been from or placed on the bus.

• Bus request: indicates that a module needs to gain control of the bus.

• Bus grant: indicates that a requesting module has been granted control of the bus.

• Interrupt request: indicates that an interrupt is pending.

• Interrupt ACK: indicates that the pending interrupt has been recognized.

• Clock: used to synchronize operations.

• Reset: initializes all modules.

What does a bus look like?

• Parallel lines on a circuit board.

• Ribbon cables.

• Strip connectors of a circuit board.

o PCI, AGP, PCI Express, SCSI, etc…

• Sets of wires.

1.7 Layered view of a computer system.







PIPE LINING

Pipelining is used in modern computers to achieve high performance. We begin by

explaining the basics of pipelining and how it can lead to improved performance. Then

we examine machine instruction features that facilitate pipelined execution, and we show

that the choice of instructions and instruction sequencing can have a significant effect on

performance. Pipelined organization requires sophisticated compilation techniques, and

optimizing compilers have been developed for this purpose. Among other things, such



compilers rearrange the sequence of operations to maximize the benefits of pipelined

execution.

BASIC CONCEPTS

The speed of execution of programs is influenced by many factors. One way to improve

performance is to use faster circuit technology to build the processor and the main

memory. Another possibility is to arrange the hardware so that more than one operation

can be performed at the same time. In this way, the number of operations performed per

second is increased even though the elapsed time needed to perform any one operation is

not changed.

We have encountered concurrent activities several times before. In multiprogramming

DMA devices make I/O transfers and simultaneous computational activities possible

because they can perform I/O transfers independently once these transfers are initiated by

the processor.

Pipelining is a particularly effective way of organizing concurrent activity in a computer

system. The basic idea is very simple. It is frequently encountered in manufacturing

plants, where pipelining is commonly known as an assembly-line operation. Readers are

undoubtedly familiar with the assembly line used in car manufacturing. The first station

in an assembly line may prepare the chassis of a car, the next station adds the body, the

next one installs the engine, and so on. While one group of workers is installing the

engine on one car, another group is fitting a car body on the chassis of another car, and

yet another group is preparing a new chassis for a third car. It may take days to complete

work on a given car, but it is possible to have a new car rolling off the end of the

assembly line every few minutes.

Consider how the idea of pipelining can be used in a computer. The processor executes a

program by fetching and executing instructions, one after the other. Let F and E, refer to

the fetch and execute steps for instruction I. Execution of a program consists of a

sequence of fetch and execute steps, as shown in Fig1.10.

Now consider a computer that has two separate hardware units, one for fetching

instructions and another for executing them, as shown in Figure1.19. The instruction

fetched by the fetch unit is deposited in an intermediate storage buffer, B 1. This buffer is

needed to enable the execution unit to execute the instruction while the fetch unit is

fetching the next instruction. The results of execution are deposited in the destination

location specified by the instruction. For the purposes of this discussion, we assume that

both the source and the destination of the data operated on by the instructions are inside

the block labeled ―Execution unit.‖

The computer is controlled by a clock whose period is such that the fetch and execute

steps of any instruction can each be completed in one clock cycle. Operation of the

computer proceeds as in Figure 1.10c. In the first clock cycle, the fetch unit fetches an

instruction I1 (step F1) and stores it in buffer Bi at the end of the clock cycle. In the

second clock cycle, the instruction fetch unit proceeds with the fetch operation for

instruction 12 (step F2). Meanwhile, the execution unit performs the operation specified

by instruction I1, which is available to it in buffer Bi (step E1). By the end of the second



clock cycle, the execution of instruction 1 is completed and instruction 12 is available.

Instruction ‗2 is stored in B 1, replacing I, which is no longer needed. Step E2 is

performed by the execution unit during the third clock cycle, while instruction 13 is being

fetched by the fetch unit. In this manner, both the fetch and execute units are kept busy

all the time. If the pattern in Figure 1.10c can be sustained for a long time, the completion

rate of instruction execution will be twice that achievable by the sequential operation

depicted in Figure 1.10a.

In summary, the fetch and execute units in Figure 1.10b constitute a two-stage pipeline in

which each stage performs one step in processing an instruction. An inter- stage storage

buffer, B 1, is needed to hold the information being passed from one stage to the next.

New information is loaded into this buffer at the end of each clock cycle.

Fig. 1.10 Basic idea of instruction pipelining

The processing of an instruction need not be divided into only two steps. For example, a

pipelined processor may process each instruction in four steps, as follows:



F Fetch: read the instruction from the memory.

D Decode: decode the instruction and fetch the source operand(s).

E Execute: perform the operation specified by the instruction.

W Write: store the result in the destination location.

Fig 1.11 A 4-stage pipelining.

The sequence of events for this case is shown in Figure 1.11a. Four instructions are in

progress at any given time. This means that four distinct hardware units are needed, as

shown in Figure 1.11b. These units must be capable of performing their tasks

simultaneously and without interfering with one another. Information is passed from one

unit to the next through a storage buffer. As an instruction progresses through the

pipeline, all the information needed by the stages downstream must be passed along. For

example, during clock cycle 4, the information in the buffers is as follows:

• Buffer B 1 holds instruction I3, which was fetched in cycle 3 and is being decoded by

the instruction-decoding unit.

• Buffer B2 holds both the source operands for instruction I2 and the specification of the

operation to be performed. This is the information produced by the decoding hardware in

cycle3. The buffer also holds the information needed for the write step of instruction I2

(step W2). Even though it is not needed by stage E, this information must be passed on to



stage W in the following clock cycle to enable that stage to perform the required Write

operation.

• Buffer B3 holds the results produced by the execution unit and the destination

information for instruction Ii.

PIPELINE PERFORMANCE

The pipelined processor in Figure 1.11 completes the processing of one instruction in

each clock cycle, which means that the rate of instruction processing is four times that of

sequential operation. The potential increase in performance resulting from pipelining is

proportional to the number of pipeline stages. However, this increase would be achieved

only if pipelined operation as depicted in Figure 1.11a could be sustained without

interruption throughout program execution. Unfortunately, this is not the case.

For a variety of reasons, one of the pipeline stages may not be able to complete its

processing task for a given instruction in the time allotted. For example, stage E in the

four-stage pipeline of Figure 1.11b is responsible for arithmetic and logic operations, and

one clock cycle is assigned for this task. Although this may be sufficient for most

operations, some operations, such as divide, may require more time to complete. Figure

1.12 shows an example in which the operation specified in instruction I2 requires three

cycles to complete, from cycle 4 through cycle 6. Thus, in cycles 5 and 6, the Write stage

must be told to do nothing, because it has no data to work with. Meanwhile, the

information in buffer B2 must remain intact until the Execute stage has completed its

operation. This means that stage 2 and, in turn, stage I are blocked from accepting new

instructions because the information in B 1 cannot be overwritten. Thus, steps D4 and F5

must be postponed as shown.

Pipelined operation in Figure 1.12 is said to have been stalled for two clock cycles.

Normal pipelined operation resumes in cycle 7. Any condition that causes the pipeline to

stall is called a hazard. We have just seen an example of a data hazard. A data hazard is

any condition in which either the source or the destination operands of an instruction are

not available at the time expected in the pipeline. As a result some operation has to be

delayed, and the pipeline stalls.

Fig. 1.12 Effect of an execution operation taking more than one clock cycle.



CPU-ARITHMETIC

Binary number system

The number system followed by computers

Base is two and any number is represented as an array containing 1‘s and 0‘s

representing coefficients of power of two.

Used in computer systems because of the ease of representing 1 and 0 as two

levels of voltage/power – high and low

To represent decimal system, 10 levels of voltage would be required!

Correspondingly complex hardware too

Binary arithmetic • Addition

• Subtraction

• Multiplication

• Division

Binary addition

• Four basic rules for elementary addition

• 0 + 0 = 0 ; 0 + 1 = 1; 1 + 0 = 1; 1 + 1 = 10;

• Carry-overs are performed in the same manner as in decimal addition • 11001 + 1001 =?

• How to add multiple (more than two) binary numbers?

Binary subtraction

• Four rules for elementary subtraction

• 0 – 0 = 0; 1 – 0 = 1; 1 – 1 = 0;

• 0 – 1 = 1, but with a borrow of 1 from the next column of minuend

• 1101 – 1100 = 1

• 1100 - 1001 =?

Signed binary numbers

• Like in decimal, we need to represent negative

numbers in binary system too

Decimal system uses ‗-‘ sign, but computers understand only 1‘s and 0‘s.

A solution is to add a digit (sign bit) to represent the sign – approach is called

Signed Magnitude Representation

0 marks positive and 1 marks negative

Problem! Have to specify the number of bits in the number to avoid

misinterpretation



Complements

With signed magnitude, representing number is simple but arithmetic is complex

Nice to have a representation in which addition simple and other operations can

be done using addition

Multiplication is repeated addition; division is repeated subtraction

Using complements we can perform subtraction using addition

1’s complement

•By complementing (changing 1 to 0 and 0 to 1) each bit in the number.

• Most significant bit tells us the sign of the number

9 => 01001 2

-9 => 10110 2

Subtraction using 1’s complement

• To subtract, add 1‘s complement of the number.

• If there is an overflow bit, add the overflow bit to the remaining part

0111 – 0101 => 0111 + 1010

0111 + 0001+

1010 1

------- ---------

10001 0010

2’s complement

• Add 1 to the 1‘s complement form

Addition

1. Represent both operands in signed-2's complement format (If operand X>0, keep

its original binary form. If operand X<0, take 2's complement of X : 2n _

X )

2. Add operands, discard carry-out of the sign bit MSB (if any).

3. The result is automatically in signed-2's complement form.

Example: (n=6 bits):

6 -6 9 -9

000110 111010 001001 110111



ADDITION/SUBTRACTION

-The 1 in the 7th bit is automatically dropped.

The MSB of the result is 1, indicating it is a negative result represented in signed 2's

complement form. Its value can be found by 2's complement to be

‘

The 1 in the 7th bit is automatically dropped. The MSB of the result is 1, indicating it is a

negative result represented in signed 2's complement form. Its value can be found by 2's

complement to be

.



Subtraction

1. Represent both operands in signed-2's complement format.

2. Take 2's complement of the subtrahend B (which may be in complement form

already if it is negative).

3. Add it to the minuend A.

4. The result is automatically in signed-2's complement form.

Example: Given

, find .

Represent both operands in signed 2's complement:

,

Complement the subtrahand so that it becomes .

Add it to minuend:

, which is in signed 2's complement

Why does it work?

Consider the following three cases (where A>0, B>0):

This is a normal binary addition with a positive sum.



(or )

The negative value –B is represented by 2's complement and

If A- B >0, the result is in binary form with automatically dropped.

If A - B< 0, the result is 2's complement representation of a negative value A- B.

Both negative values -A and -B are represented in 2's complement form as

and and

The first is automatically dropped and the second term is the 2's complement

representation of a negative value – (A+B).

We see that signed 2's complement representation can properly deal with both addition

and subtraction with negative operands as well as positive ones.

Example: (n=4 bits)

The MSB of the result is 1, indicating it is a negative result represented in signed

2's complement form. Its value can be found by 2's complent to be

, a wrong result!



The MSB of the result is 1, indicating it is a negative result represented in signed

2's complement form. Its value can be found by 2's complement to be

The 1 in the 5th bit is dropped.

The 1 in the 5th bit is dropped. The result is 5, another wrong result!

The wrong results are caused by overflow problem. Given n=4 bits, the range of valid

values representable is to .

The overflow problem can be detected by checking whether the carry-in Cin to and carry-

out Cout from the MSB are the same. Consider the sign bit of the following six cases of

addition:



It is obvious that when Cin ≠ Cout, the result is incorrect due to overflow.

Hardware Implementation: An n-bit adder can be built by concatenating n full

adders:

This n-bit adder can also carry out subtraction. A-B as well as addition A+B.

A control signal is used to control a 2x1 MUX to select either for

addition when , or for subtraction when

. The subtraction is carried out by adding 2's complement of

operand to . (Recall that 2's complement can be obtained by bit-wise complement

and adding 1 to the LSB.)



ADDITION AND SUBTRACTION OF SIGNED NUMBERS

Figure shows the logic truth table for the sum and carry-out functions for adding equally

weighted bits x, and y in two numbers X and Y. The figure also shows logic expressions

for these functions, along with an example of addition of the 4-bit unsigned numbers 7

and 6. Note that each stage of the addition process must accommodate a carry-in bit.

Logic specification for a stage of binary addition

We use c, to represent the carry-in to the ith

stage, which is the same as the carry-out from

the (i — 1)st stage. The logic expression for s in above figure can be implemented with a

3-input XOR gate, used in following figure a as part of the logic required for a single

stage of binary addition.



Logic for addition of binary vectors

The carry-out function, c1 i, is implemented with a two-level AND-OR logic circuit. A

convenient symbol for the complete circuit for a single stage of addition, called a full

adder (FA), is also shown in the figure. A cascaded connection of n full adder blocks, as

shown in Figure b, can be used to add two n-bit numbers. Since the carries must

propagate, or ripple, through this cascade, the configuration is called an n-bit ripple-carry

adder



The carry-in, Co, into the least-significant-bit (LSB) position provides a convenient

means of adding 1 to a number. For instance, forming the 2‘s-complement of a number

involves adding 1 to the 1‘s-complement of the number. The carry signals are also useful

for interconnecting k adders to form an adder capable of handling input numbers that are

kn bits long, as shown in Figure c.

The n-bit adder in Figure b can be used to add 2‘s-complement numbers X and Y, where

the Xn-1 — and Yn-1, i bits are the sign bits. In this case, the carry-out bit, cn is not part

of the answer. Overflow can only occur when the signs of the two operands are the same.

In this case, overflow obviously occurs if the sign of the result is different. Therefore, a

circuit to detect overflow can be added to the n-bit adder by implementing the logic

expression

It can also be shown that overflow occurs when the carry bits c, and Cn_l are different.

Therefore, a simpler alternative circuit for detecting overflow can be obtained by

implementing the expression cn XOR cn-1 with an XOR gate.

In order to perform the subtraction operation X — Y on 2‘s-complement numbers X and

Y, we form the 2‘s-complement of Y and add it to X. The logic circuit network shown in

the following figure can be used to perform either addition or subtraction based on the

value applied to the Add/Sub input control line. This line is set to 0 for addition, applying

the Y vector unchanged to one of the adder inputs along with a carry-in signal, c0, of 0.

When the Add/Sub control line is set to 1, the Y vector is 1‘s-complemented (that is, bit

complemented) by the XOR gates and c0 is set to 1 to complete the 2‘s-complementation

of Y. Remember that 2‘s-complementing a negative number is done in exactly the same

manner as for a positive number. An XOR gate can be added to the following figure to

detect the overflow condition cn XOR cn-1.



Binary addition subtraction logic network

Binary Coded Decimal (BCD)

Introduction:

Although binary data is the most efficient storage scheme; every bit pattern represents a

unique, valid value. However, some applications may not be desirable to work with

binary data.

For instance, the internal components of digital clocks keep track of the time in binary.

The binary value must be converted to decimal before it can be displayed.

Because a digital clock is preferable to store the value as a series of decimal digits, where

each digit is separately represented as its binary equivalent, the most common format

used to represent decimal data is called binary coded decimal, or BCD

BCD Numeric Format

-Every four bits represent one decimal digit.

Use decimal values from 0 to 9

4-bit values above 9 are not used in BCD.

The unused 4-bit values are:

BCD Decimal

1010 10

1011 11

1100 12

1101 13

1110 14

1111 15

Multi-digit decimal numbers are stored as multiple groups of 4 bits per digit.



BCD is a signed notation

positive or negative.

For example, +27 as 0(sign) 0010 0111.

-27 as 1(sign) 0010 0111.

BCD does not store negative numbers in two‘s complement.

Values represented

b3b2b1b0 Sign &

magnitude

1‘s

complement

2‘s

complemt

0111 +7 +7 +7

0110 +6 +6 +6

0101 +5 +5 +5

0100 +4 +4 +4

0011 +3 +3 +3

0010 +2 +2 +2

0001 +1 +1 +1

0000 +0 +0 +0

1000 -0 -7 -8

1001 -1 -6 -7

1010 -2 -5 -6

1011 -3 -4 -5

1100 -4 -3 -4

1101 -5 -2 -3

1110 -6 -1 -2

1111 -7 -0 -1

Algorithms for Addition

0101 5

+ 1001 +9



1110 Incorrect BCD digit

+ 0110 Add 6

1 0100 Correct answer

1 4

BCD adder

-If the result, S3 S2 S1 S0, is not a valid BCD digit, the multiplexer causes 6 to be added

to the result.

Carry-Look-ahead Adder

There are several factors that contribute to the delay in the digital adders. One is the

propagation delay due to the internal structure of the gates, another factor is the loading

of the output buffers (due to fanout and net delays), and a third factor is the logic circuit

itself.

The propagation delay (or gate delay) of a gate is the time difference between the change

of the input and output signals.

Ripple-carry vs. Carry-look-ahead Adders



One type of circuit where the effect of gate delays is particularly clear, is an ADDER. In

a 4-bit adder ripple-carry adder the result of an addition of two bits depends on the carry

generated by the addition of the previous two bits. Thus, the Sum of the most significant

bit is only available after the carry signal has rippled through the adder from the least

significant stage to the most significant stage. This can be easily understood if one

considers the addition of the two 4-bit words: 1 1 1 12 + 0 0 0 12, as shown in Figure 3.

Figure 3: Addition of two 4-bit numbers illustrating the generation of the carry-out bit

In this case, the addition of (1+1 = 102) in the least significant stage causes a carry bit to

be generated. This carry bit will consequently generate another carry bit in the next stage,

and so on, until the final carry-out bit appears at the output. This requires the signal to

travel (ripple) through all the stages of the adder as illustrated in Figure 4 below. As a

result, the final Sum and Carry bits will be valid after a considerable delay. The carry-out

bit of the first stage will be valid after 4 gate delays (2 associated with the XOR gate and

1 each associated with the AND and OR gates).

From the schematic of Figure 4, one finds that the next carry-out (C2) will be valid after

an additional 2 gate delays (associated with the AND and OR gates) for a total of 6 gate

delays. In general the carry-out of a N-bit adder will be valid after 2N+2 gate delays. The

Sum bit will be valid an additional 2 gate delays after the carry-in signal. Thus the sum of

the most significant bit SN-1 will be valid after 2(N-1) + 2 +2 = 2N +2 gate delays. This

delay may be in addition to any delays associated with interconnections. It should be

mentioned that in case one implements the circuit in a FPGA, the delays may be different

from the above expression depending on how the logic has been placed in the look up

tables and how it has been divided among different CLBs.



Figure 4: Ripple-carry adder, illustrating the delay of the carry bit.

Features of Ripple-carry adder: - Multiple full adders with carry ins and carry outs

chained together

- Small Layout area

- Large delay time

The disadvantage of the ripple-carry adder is that it can get very slow when one needs to

add many bits. For instance, for a 32-bit adder, the delay would be about 66 ns if one

assumes a gate delay of 1 ns. That would imply that the maximum frequency one can

operate this adder would be only 15 MHz! For fast applications, a better design is

required. The carry-look-ahead adder solves this problem by calculating the carry signals

in advance, based on the input signals. It is based on the fact that a carry signal will be

generated in two cases: (1) when both bits Ai and Bi are 1, or (2) when one of the two bits

is 1 and the carry-in (carry of the previous stage) is 1. Thus, one can write,

COUT = Ci+1 = Ai.Bi + (Ai Bi).Ci. (1)

The " " stands for exclusive OR or XOR. One can write this expression also, as

Ci+1 = Gi + Pi.Ci (2)

in which Gi = Ai.Bi (3)

Pi = (Ai Bi) (4)

are called the Generate and Propagate term, respectively.



Lets assume that the delay through an AND gate is one gate delay and through an XOR

gate is two gate delays. Notice that the Propagate and Generate terms only depend on the

input bits and thus will be valid after two and one gate delay, respectively. If one uses the

above expression to calculate the carry signals, one does not need to wait for the carry to

ripple through all the previous stages to find its proper value. Let‘s apply this to a 4-bit

adder to make it clear.

C1 = G0 + P0.C0 (5)

C2 = G1 + P1.C1 = G1 + P1.G0 + P1.P0.C0 (6)

C3 = G2 + P2.G1 + P2.P1.G0 + P2.P1.P0.C0 (7)

C4 = G3 + P3.G2 + P3.P2.G1 + P3P2.P1.G0 + P3P2.P1.P0.C0 (8)

Notice that the carry-out bit, Ci+1, of the last stage will be available after four delays (two

gate delays to calculate the Propagate signal and two delays as a result of the AND and

OR gate). The Sum signal can be calculated as follows,

Si = Ai Bi Ci = Pi Ci. (9)

The Sum bit will thus be available after two additional gate delays (due to the XOR gate)

or a total of six gate delays after the input signals Ai and Bi have been applied. The

advantage is that these delays will be the same independent of the number of bits one

needs to add, in contrast to the ripple counter.

The carry-lookahead adder can be broken up in two modules: (1) the Partial Full Adder,

PFA, which generates Si, Pi and Gi as defined by equations 3, 4 and 9 above; and (2) the

Carry Look-ahead Logic, which generates the carry-out bits according to equations 5 to

8. The 4-bit adder can then be built by using 4 PFAs and the Carry Look-ahead logic

block as shown in Figure 5.

Figure 5: Block diagram of a 4-bit CLA.



The disadvantage of the carry-lookahead adder is that the carry logic is getting quite

complicated for more than 4 bits. For that reason, carry-look-ahead adders are usually

implemented as 4-bit modules and are used in a hierarchical structure to realize adders

that have multiples of 4 bits. Figure 6 shows the block diagram for a 16-bit CLA adder.

The circuit makes use of the same CLA Logic block as the one used in the 4-bit adder.

Notice that each 4-bit adder provides a group Propagate and Generate Signal, which is

used by the CLA Logic block. The group Propagate PG of a 4-bit adder will have the

following expressions,

PG = P3.P2.P1.P0 ; (10)

GG = G3 + P3G2 + P3.P2.G1. + P3.P2.P1.G0 (11)

The group Propagate PG and Generate GG will be available after 3 and 4 gate delays,

respectively (one or two additional delays than the Pi and Gi signals, respectively).

Figure 6: Block diagram of a 16-bit CLA Adder

MULTIPLICATION

Algorithms for Multiplication

1101 Multiplicand M

X 1011 Multiplier Q

1101

1101

0000

1101____

10001111 Product P

Array multiplier

Combination circuit

Product generated in one micro operation



Requires large number of gates

Became feasible after integrated circuits developed

Needed for j multiplier and k multiplicand bits

o j x k AND gates

o j –1 k-bit adders to produce product of j + k bits

Multiply Signed-2’s Complement

Booth algorithm:

This algorithm serves two purposes:

Fast multiplication when there are consecutive 0‘s or 1‘s in the multiplier.



Can be used for signed multiplication.

QR multiplier

Qn least significant bit of QR

Qn+1 previous least significant bit of QR

BR multiplicand

AC= 0

SC number of bits in multiplier

Algorithm:

1. Do SC + 1 times

2. If QnQn+1 = 10

AC ← AC + BR + 1

3. If QnQn+1 = 01

AC ← AC + BR

4. Arithmetic shift right AC & QR

5. SC←SC –1

Explanation:

1. Depending on the current and previous bits, do one of the following:

00: a. Middle of a string of 0s, so no arithmetic operations.

01: b. End of a string of 1s, so add the multiplicand to the left

half of the product.

10: c. Beginning of a string of 1s, so subtract the multiplicand

from the left half of the product.

11: d. Middle of a string of 1s, so no arithmetic operation.

2. As in the previous algorithm, shift the Product register right (arith) 1 bit.



Hardware



Example: -9 x -13 = 117

DIVISION

Algorithms for Division

Division can be implemented using either a restoring or a non-restoring algorithm. An

inner loop to perform multiple subtractions must be incorporated into the algorithm.

10

11 ) 1000

11_

10



A logic circuit arrangement implements the restoring-division technique

The restoring-division algorithm:

S1: DO n times

Shift A and Q left one binary position.

Subtract M from A, placing the answer back in A.

S2: If the sign of A is 1, set q0 to 0 and add M back to A (restore A); otherwise, set q0

to 1.

A restoring-division example



Initially 0 0 0 0 0 1 0 0 0

0 0 0 1 1

Shift 0 0 0 0 1 0 0 0

Subtract 1 1 1 0 1

Set q0 1 1 1 1 0

Restore 1 1

0 0 0 0 1 0 0 0 0

Shift 0 0 0 1 0 0 0 0

Subtract 1 1 1 0 1

Set q0 1 1 1 1 1

Restore 1 1

0 0 0 1 0 0 0 0 0

Shift 0 0 1 0 0 0 0 0

Subtract 1 1 1 0 1

Set q0 0 0 0 1 0 0 0 0 1

Shift 0 0 0 1 0 0 0 1

Subtract 1 1 1 0 1

Set q0 1 1 1 1 1

Restore 1 1

0 0 0 1 0 0 0 1 0 Quotient

Remainder

First cycle

Second cycle

Third cycle

Fourth cycle



The non-restoring division algorithm:

S1: Do n times

If the sign of A is 0, shift A and Q left one binary position and subtract M from A;

otherwise, shift A and Q left and add M to A.

S2: If the sign of A is 1, add M to A.



Assume the dividend and the divisor is 124 and 7 respectively. The Non –restoring

division scheme would proceed as follows:

(124 decimal= 01111100 binary).The M register contains the divisor 7 ( M= 00000111).



A few comparisons

Restoring division is most efficient for

- floating-point division

- for integer division when the divisor is not small

-

easy to implement.

Non Restoring Division

The main advantage is the compatibility with 2's complement notation for

dividend and divisor.







Note that a single precision floating point number is normalized only if it can be

expressed in binary in the form: 1.M x 2E where M is the 23-bit mantissa and the

exponent E is such that -126≤ E ≤127. A de-normalized number requires an exponent less

than -126, in which case the number would be represented using the special pattern 0 for

the characteristic to denote an exponent of -126 and the significand is expressed as a pure

fraction 0.M. Thus the value is 0.Mx2-126









The difference in exponents determines which of the significands is shifted

The result is then normalized and the exponent is adjusted if necessary.

The rounding hardware then creates the final result.





ALU Design

A One Bit ALU

° This 1-bit ALU will perform AND, OR, and ADD

A One-bit Full Adder



This is also called a (3, 2) adder

Half Adder: No CarryIn nor CarryOut

Truth Table:

Logic Equation for CarryOut



° CarryOut = (!A & B & CarryIn) | (A & !B & CarryIn) | (A & B & !CarryIn)

| (A & B & CarryIn)

° CarryOut = B & CarryIn | A & CarryIn | A & B

Logic Equation for Sum

° Sum = (!A & !B & CarryIn) | (!A & B & !CarryIn) | (A & !B & !CarryIn)

| (A & B & CarryIn)

° Sum = (!A & !B & CarryIn) | (!A & B & !CarryIn) | (A & !B & !CarryIn)

| (A & B & CarryIn)

° Sum = A XOR B XOR CarryIn

° Truth Table for XOR:

Logic Diagrams for CarryOut and Sum

CarryOut = B & CarryIn | A & CarryIn | A & B



° Sum = A XOR B XOR CarryIn

A 4-bit ALU



CONTROL UNIT ORGANIZATION

Instruction Execution

The CPU executes a sequence of instructions.

The execution of an instruction is organized as an instruction cycle: it is performed as a

succession of several steps;

• Each step is executed as a set of several microoperations.

• The task performed by any microoperation falls in one of the following categories:

- Transfer data from one register to another;

- Transfer data from a register to an external interface (system bus);

- Transfer data from an external interface to a register;

- Perform an arithmetic or logic operation, using registers for input and output.



Microoperations and Control Signals

In order to allow the execution of a microoperation, one or several control signals have to

be issued; they allow the corresponding data transfer and/or computation to be

performed.

Examples:

a) signals for transferring content of register R0 to R1:

R0out, R1in

b) signals for adding content of Y to that of R0 (result in Z):

R0out, Add, Zin

c) signals for reading a memory location; address in R3:

R3out, MARin, Read

• The CPU executes an instruction as a sequence of control steps. In each control step one

or several microoperations are executed.

• One clock pulse triggers the activities corresponding to one control step for each

clock pulse the control unit generates the control signals corresponding to the

microoperations to be executed in the respective control step.

Microoperations and Control Signals (cont’d)

Instruction:

ADD R1, R3 R1 ← R1 + R3

control steps and control signals:

instruction:

ADD R1, (R3) R1 ← R1 + [R3]




instruction:

BR target unconditional branch (with relative addressing)


• The first (three) control steps are identical for each instruction; they perform instruction

fetch and increment the PC. The following steps depend on the actual instruction (stored

in the IR).

• If a control step issues a read, the value will be available in the MBR after one

additional step.

• Several microoperations can be performed in the same control step if they don‘t conflict

(for example, only one of them is allowed to output on the bus)

Control Unit

The basic task of the control unit:

- For each instruction the control unit causes the CPU to go through a sequence of control

steps;

- in each control step the control unit issues a set of signals which cause the

corresponding microoperations to be executed.

• The control unit is driven by the processor clock.

The signals to be generated at a certain moment depend on:

- the actual step to be executed;

- the condition and status flags of the processor;



- the actual instruction executed;

- external signals received on the system bus

(e.g. interrupt signal)

Control Unit

• Techniques for implementation of the control unit:

1. Hardwired control

2. Microprogrammed control

HARDWIRED CONTROL

To execute instructions, the processor must have some means of generating the control

signals needed in the proper sequence. Computer designers use a wide variety of

techniques to solve this problem. The approaches used fall into one of two categories:

hardwired control and microprogrammed control. We discuss each of these techniques in

detail, starting with hardwired control in this section.

Consider the sequence of control signals given in Figure 1.6. Each step in this sequence is

completed in one clock period. A counter may be used to keep track of the control steps,

as shown in Figure 1.10. Each state, or count, of this counter corresponds to one control

step. The required control signals are determined by the following information:

• Contents of the control step counter

• Contents of the instruction register

• Contents of the condition code flags

• External input signals, such as MFC and interrupt requests



Fig. 1.10 Control unit Organization

To gain insight into the structure of the control unit, we start with a simplified view of the

hardware involved. The decoder/encoder block in Figure 1.10 is a combinational circuit

that generates the required control outputs, depending on the state of all its inputs. By

separating the decoding and encoding functions, we obtain the more detailed block

diagram in Figure 1.11. The step decoder provides a separate signal line for each step, or

time slot, in the control sequence. Similarly, the output of the instruction decoder consists

of a separate line for each machine instruction. For any instruction loaded in the IR, one

of the output lines INS1 through INSm is set to 1, and all other lines are set to 0. The input

signals to the encoder block in Figure 1.11 are combined to generate the individual

control signals Ym, PC0, Add, End, and so on. An example of how the encoder generates

the Z1 control signal for the processor organization in Figure 1.1 is given in Figure 1.12.

This circuit implements the logic function

This signal is asserted during time slot T1 for all instructions, during T6 for an Add

instruction, during T4 for an unconditional branch instruction, and so on. The logic

function for Z is derived from the control sequences in Figures 1.6 and 1.7. As another

example, Figure 1.13 gives a circuit that generates the End control signal from the logic

function

The End signal starts a new instruction fetch cycle by resetting the control step counter

to its starting value. Figure 1.11 contains another control signal called RUN.



Fig.1.11 Separation of the decoding and encoding functions

Fig.1.12 Generation of the Zin control signal for the processor in Fig 1.1



Fig. 1.13 Generation of the End control signal.

When set to 1, RUN causes the counter to be incremented by one at the end of every

clock cycle. When RUN is equal to 0, the counter stops counting. This is needed

whenever the WMFC signal is issued, to cause the processor to wait for the reply from

the memory.

The control hardware shown in Figure 1.10 or 1.11 can be viewed as a state machine that

changes from one state to another in every clock cycle, depending on the contents of the

instruction register, the condition codes, and the external inputs. The outputs of the state

machine are the control signals. The sequence of operations carried out by this machine is

determined by the wiring of the logic elements, hence the name ―hardwired.‖ A controller

that uses this approach can operate at high speed. However, it has little flexibility, and the

complexity of the instruction set it can implement is limited.

• In the case of hardwired control, the control unit is a combinatorial circuit; it gets a set

of inputs (from IR, flags, clock system bus) and transforms them into a set of control

signals.



Generation of signal Zin:

- first step of all instructions (fetch instruction)

- step 5 of ADD with register addressing

- step 5 of BR

- step 6 of ADD with register-indirect addressing

Zin =T1 +T5 ⋅ (ADDreg + BR) + T6 ⋅ ADDreg_ind +...

Generation of signal End:

- step 6 of ADD with register addressing

- step 7 of ADD with register-indirect addressing

- step 6 of BR

End = T6 ⋅ (ADDreg + BR) + T7 ⋅ ADDreg_ind + . . .

Advantages:

Hardwired control provides highest speed.

RISCs are implemented with hardwired control.

If the instruction set becomes very complex (CISCs) implementing hardwired

control is very difficult. In this case microprogrammed control units are used.



In order to allow execution of register-to-register operations in a single clock

cycle, RISCs (and other modern processors) use three-bus CPU structures (see

following slide).

MICROPROGRAMMED CONTROL

Microprogram

- Program stored in memory that generates all the control signals required to execute

the instruction set correctly

- Consists of microinstructions

Microinstruction

- Contains a control word and a sequencing word

Control Word - All the control information required for one clock cycle.

o a sequence of Nsig bits, where Nsig is the total number of control signals;

each bit in a CW corresponds to one control signal.



o Each control step during execution of an instruction defines a certain CW;

it represents a combination of 1s and 0s corresponding to the active and

non-active control signals

Sequencing Word - Information needed to decide the next microinstruction address

- Vocabulary to write a microprogram

Microprogrammed control - basic idea:

All microroutines corresponding to the machine instructions are stored in the

control store.

The control unit generates the sequence of control signals for a certain machine

instruction by reading from the control store the CWs of the microroutine

corresponding to the respective instruction.

The control unit is implemented just like another very simple CPU, inside the CPU,

executing microroutines stored in the control store.

Control Memory (Control Storage: CS)

- Storage in the microprogrammed control unit to store the microprogram

Writeable Control Memory (Writeable Control Storage: WCS)

- CS whose contents can be modified

-> Allows the microprogram can be changed

-> Instruction set can be changed or modified

Dynamic Microprogramming

- Computer system whose control unit is implemented with a micro program in WCS

- Microprogram can be changed by a systems programmer or a user

MICROPROGRAMMED CONTROL

In hardwired control, we saw how the control signals required inside the processor can be

generated using a control step counter and a decoder/encoder circuit. Now we discuss

an alternative scheme, called microprogrammed control, in which control signals are

generated by a program similar to machine language programs.



Fig.1.15 An example of micro instructions for the Fig. 1.6

First, we introduce some common terms. A control word (CW) is a word whose

individual bits represent the various control signals in Figure 1.11. Each of the control

steps in the control sequence of an instruction defines a unique combination of is and Os

in the CW. The CWs corresponding to the 7 steps of Figure 7.6 are shown in Figure 1.15.

We have assumed that SelectY is represented by Select = 0 and Select4 by Select 1. A

sequence of CWs corresponding to the control sequence of a machine instruction

constitutes the micro routine for that instruction, and the individual control words in this

microroutine are referred to as microinstructions.

The microroutines for all instructions in the instruction set of a computer are stored in a

special memory called the control store. The control unit can generate the control signals

for any instruction by sequentially reading the CWs of the corresponding microroutine

from the control store. This suggests organizing the control unit as shown in Figure 1.16.

To read the control words sequentially from the control store, a microprogram counter

(µPC) is used. Every time a new instruction is loaded into the IR, the output of the block

labeled ―starting address generator‖ is loaded into the µPC. The 1sPC is then

automatically incremented by the clock, causing successive microinstructions to be read

from the control store. Hence, the control signals are delivered to various parts of the

processor in the correct sequence.

One important function of the control unit cannot be implemented by the simple

organization in Figure 1.16. This is the situation that arises when the control unit is

required to check the status of the condition codes or external inputs to choose between

alternative courses of action. In the case of hardwired control, this situation is handled by

including an appropriate logic function, as in Equation 1.2, in the encoder circuitry. In

microprogrammed control, an alternative approach is to use conditional branch

microinstructions. In addition to the branch address, these microinstructions specify

which of the external inputs, condition codes, or, possibly, bits of the instruction register,

should be checked as a condition for branching to take place.



The instruction Branch<O may now be implemented by a microroutine such as that

shown in Figure 1.17. After loading this instruction into IR, a branch microinstruction

transfers control to the corresponding microroutine, which is assumed to start at location

25 in the control store. This address is the output of the starting address generator block

in Figure 1.16. The microinstruction at location 25 tests the N bit of the condition

codes. If this bit is equal to 0, a branch takes place to location 0 to fetch a new machine

instruction. Otherwise, the microinstruction at location 26 is executed to put the branch

target address into register Z, as in step 4 in Figure 1.7. The microinstruction in location

27 loads this address into the PC.

Fig.1.16 Basic organization of a microprogrammed control unit

Fig.1.17 Micro routine for the instruction BRANCH<0



Fig. 1.18 Organization of the control unit to allow conditional branching in the

microprogram.

To support microprogram branching, the organization of the control unit should be

modified as shown in Figure 1.18. The starting address generator block of Figure 1.16

becomes the starting and branch address generator. This block loads a new address into

the µPC when a microinstruction instructs it to do so. To allow implementation of a

conditional branch, inputs to this block consist of the external inputs and condition codes

as well as the contents of the instruction register. In this control unit, the µPC is

incremented every time a new microinstruction is fetched from the microprogram

memory, except in the following situations:

1. When a new instruction is loaded into the IR, the µPC is loaded with the starting

address of the microroutine for that instruction.

2. When a Branch microinstruction is encountered and the branch condition is satisfied,

the µPC is loaded with the branch address.

3. When an End microinstruction is encountered, the µPC is loaded with the address of

the first CW in the microroutine for the instruction fetch cycle (this address is 0 in Figure

1.17).



Control Store Organization

• The control store contains the microprogram (sometimes called firmware).



Microroutine Executed for Conditional Branch

The microroutines contain, beside CWs, also branches which have to be

interpreted by the microprogrammed controller.

The sequencer is controlling the right execution sequence of microinstructions.

The sequencer is a small control unit of the control unit.

The greater ease and speed of designing a microprogrammed control unit versus the

design of a control unit based on a random logic implementation of the finite state

machine and next state function resulted in a significant reduction in design costs. It

was also much easier to correct errors in the microprogrammed system than in the

hardwired system.

With the lower cost and higher availability of fast RAM, some systems stored the

microcode in RAM producing what is sometimes called a writable control store

(WCS) machine. This allowed corrections of changes to the microcode even after the

machine had been delivered.

This is also made possible the loading of completely different instruction sets on the

same machine for different applications.

The main advantage of the hardwired system is their greater speed. This greater speed

coupled with the much higher cost of the hardwired systems tended to restrict their

use to high performance computers.

With the trend toward simpler instructions and control, and the advent of computer

aided design (CAD) tools, the design of hardwired control unit has become much

easier and less prone to errors. RISC machines, with their goal of executing one or

more instructions per cycle, are becoming much more prevalent. These developments

are tending to lead away from the use of microprogrammed control.



PLA CONTROL







SEQUENCER (MICROPROGRAM SEQUENCER)

A Microprogram Control Unit that determines the Microinstruction Address to be

executed in the next clock cycle

- In-line Sequencing

- Branch

- Conditional Branch

- Subroutine

- Loop

- Instruction OP-code mapping

MICROINSTRUCTION SEQUENCING

Sequencing Capabilities Required in a Control Storage

- Incrementing of the control address register

- Unconditional and conditional branches

Instruction code

Mapping logic

Multiplexers

Control memory (ROM)

Subroutine Register

(SBR)

Branch logic

Status

bits

Microoperations

Control address register (CAR)

Incrementer

MUX select

select a status bit



- A mapping process from the bits of the machine instruction to an address for control

memory

- A facility for subroutine call and return

Conditional Branch

If Condition is true, then Branch (address from

the next address field of the current microinstruction)

else Fall Through

Conditions to Test: O(overflow), N(negative),

Z(zero), C(carry), etc.

Unconditional Branch

Fixing the value of one status bit at the input of the multiplexer to 1

Control address

Register

Control memory MUX

Load address

Increment

Status

(condition)

bits

Condition select

Next address

...

Micro-

operations



MICROINSTRUCTION FORMAT

Information in a Microinstruction

- Control Information

- Sequencing Information

- Constant

Information which is useful when feeding into the system

These information needs to be organized in some way for

- Efficient use of the microinstruction bits

- Fast decoding

Field Encoding

- Encoding the microinstruction bits

- Encoding slows down the execution speed due to the decoding delay

- Encoding also reduces the flexibility due to the decoding hardware

Horizontal Microinstructions

Each bit directly controls each micro-operation or each control point

Horizontal implies a long microinstruction word

Advantages: Can control a variety of components operating in parallel.

--> Advantage of efficient hardware utilization

Disadvantages: Control word bits are not fully utilized



CS becomes large --> Costly

Ingeneral ,the number of bits in a microinstruction rasnge from around a dozen to over a

hundred. The exact number depends on the complexity of the datapath and on the

number and types of instructions as well as the number of allowed instruction operands

and their addressing modes.

A horizontal microcode system uses minimal encoding to specify the control

information. For example, if there are 32 registers that might be used as an operand, then

a separate bit would signal whether the corresponding register is selected. Or if there

were 128 different operations that could be specified, then a separate bit would be used

for each. A disadvantage of this approach is that relatively few of the actions specified by

bits in the microinstruction can occur in parallel and only one register at a time can be

selected as a source or destination operand. This leads to the presence of many zeros in

the microinstruction, and creates a lot of wasted space in the memory.

Vertical Microinstructions

A microinstruction format that is not horizontal

Vertical implies a short microinstruction word

Encoded Microinstruction fields

--> Needs decoding circuits for one or two levels of decoding

In a vertical microcode system, the widths of the fields such as the register number and

ALU operations are reduced by encoding the information in a shorter form. For example,

any one of registers can be specified using 5-bit field or 7 bits could be used encode upto

128 different operations. The main disadvantage of this approach when compared to the

horizontal microcode system is the slower operation due to the need to decode fields.

One-level decoding

Field A 2 bits

2 x 4

Decode

r

3 x 8

Decoder

Field B

3 bits

1 of 4 1 of 8

Two-level decoding

Field A

2 bits

2 x 4

Decoder 6 x 64

Decoder

Field B

6 bits

Decoder and

selection logic



Nanostorage and Nanoinstruction

Nanoinstructions are used to drive a lookup table of microinstructions in a machine

where a nanostore is used. This is appropriate where many of the microinstructions occur

several times through the micro program. In this case, the distinct microinstructions are

placed in a small control storedex. The nanostore then contains (in order) the index in the

microcontrol store of the appropriate microinstruction.

Usually, the microprogram consists of a large number of short microinstructions, while

the nanoprogram contains fewer words with longer nanoinstructions.

The decoder circuits in a vertical microprogram storage organization can be replaced by a

ROM

=> Two levels of control storage

First level - Control Storage

Second level - Nano Storage

Two-level microprogram

First level

-Vertical format Microprogram

Second level

-Horizontal format Nanoprogram

-Interprets the microinstruction fields, thus converts a vertical

microinstruction format into a horizontal nanoinstruction format.

Usually, the microprogram consists of a large number of short microinstructions, while

the nanoprogram contains fewer words with longer nanoinstructions.

Two-Level Microprogramming - Example

Microprogram: 2048 microinstructions of 200 bits each

With 1-Level Control Storage: 2048 x 200 = 409,600 bits

Assumption: 256 distinct microinstructions among 2048

With 2-Level Control Storage:

o Nano Storage: 256 x 200 bits to store 256 distinct nanoinstructions

o Control storage: 2048 x 8 bits

o To address 256 nano storage locations 8 bits are needed

Total 1-Level control storage: 409,600 bits

Total 2-Level control storage: 67,584 bits (256 x 200 + 2048 x 8)



Control address register

11 bits

Control memory

2048 x 8

Microinstruction (8bits)

Nanomemory address

Nanomemory

256 x 200

Nanoinstructions ( 200 bits)









THE MEMORY SYSTEM

Programs and the data they operate on are held in the memory of the computer, In this

chapter, we discuss how this vital part of the computer operates. By now, the reader

appreciates that the execution speed of programs is highly dependent on the speed with

which instructions and data can be transferred between the processor and the memory. It

is also important to have a large memory to facilitate execution of programs that are large

and deal with huge amounts of data.

Ideally, the memory would be fast, large, and inexpensive. Unfortunately, it is impossible

to meet all three of these requirements simultaneously. Increased speed and size are

achieved at increased cost. To solve this problem, much work has gone into developing

clever structures that improve the apparent speed and size of the memory, yet keep the

cost reasonable.

First, we describe the most common components and organizations used to implement

the memory. Then we examine memory speed and discuss how the apparent speed of the

memory can be increased by means of caches. Next, we present the virtual memory

concept, which increases the apparent size of the memory. Finally, we discuss the

secondary storage devices, which provide much larger storage capability.

SOME BASIC CONCEPTS The maximum size of the memory that can be used in any computer is determined by I

the addressing scheme. For example, a 16-bit computer that generates 16-bit addresses is

capable of addressing up to 216 = 64K memory locations. Similarly, machines whose

instructions generate 32-bit addresses can utilize a memory that contains up to 232 =4G

(giga) memory locations, whereas machines with 40-bit addresses can access up to 240 =

1T (tera) locations. The number of locations represents the size of the address space of

the computer.

Most modern computers are byte addressable. Figure 2.7 shows the possible address

assignments for a byte-addressable 32-bit computer. The big-endian arrangement is used

in the 68000 processor. The little-endian arrangement is used in Intel processors. The

ARM architecture can be configured to use either arrangement. As far as the memory

structure is concerned, there is no substantial difference between the two schemes.

The memory is usually designed to store and retrieve data in word-length quantities. In

fact, the number of bits actually stored or retrieved in one memory access is the most

common definition of the word length of a computer. Consider, for example, a byte-

addressable computer whose instructions generate 32-bit addresses. When a 32-bit

address is sent from the processor to the memory unit, the high-order 30 bits determine

which word will be accessed. If a byte quantity is specified, the low-order 2 bits of the

address specify which byte location is involved. In a Read operation, other bytes may be

fetched from the memory, but they are ignored by the processor. If the byte operation is a

Write, however, the control circuitry of the memory must ensure that the contents of

other bytes of the same word are not changed.

Modern implementations of computer memory are rather complex and difficult to

understand on first encounter. To simplify our introduction to memory structures, we will

first present a traditional architecture. Then, in later sections, we will discuss the latest

approaches.



From the system standpoint, we can view the memory unit as a black box. Data transfer

between the memory and the processor takes place through the use of two processor

registers, usually called MAR (memory address register) and MDR (memory data

register), as introduced in Section 1.2. If MAR is k bits long and MDR is n bits long, then

the memory unit may contain up to 2k addressable locations. During a memory cycle, n

bits of data are transferred between the memory and the processor. This transfer takes

place over the processor bus, which has k address lines and n data lines. The bus also

includes the control lines Read/Write (R/W) and Memory Function Completed (WC) for

coordinating data transfers. Other control lines may be added to indicate the number of

bytes to be transferred. The connection between the processor and the memory is shown

schematically in Figure 4.1.

The processor reads data from the memory by loading the address of the required

memory location into the MAR register and setting the R/W line to 1. The memory

responds by placing the data from the addressed location onto the data lines, and

confirms this action by asserting the MFC signal. Upon receipt of the MFC signal, the

processor loads the data on the data lines into the MDR register.

The processor writes data into a memory location by loading the address of this

location into MAR and loading the data into MDR. It indicates that a write operation is

involved by setting the R/W line to 0.

If read or write operations involve consecutive address locations in the main memory,

then a ―block transfer‖ operation can be performed in which the only address sent to the

memory is the one that identifies the first location.

Memory accesses may be synchronized using a clock, or they may be controlled using

special signals that control transfers on the bus, using the bus signaling schemes. Memory

read and write operations are controlled as input and output bus transfers, respectively.

A useful measure of the speed of memory units is the time that elapses between the

initiation of an operation and the completion of that operation, for example, the time

between the Read and the MFC signals. This is referred to as the memory access time,

Another important measure is the memory cycle time, which is the minimum time delay

required between the initiation of two successive memory operations, for example, the



time between two successive Read operations. The cycle time is usually slightly longer

than the access time, depending on the implementation details of the memory unit.

A memory unit is called random-access memory (RAM) if any location can be accessed

for a Read or Write operation in some fixed amount of time that is independent of the

location‘s address. This distinguishes such memory units from serial, or partly serial,

access storage devices such as magnetic disks and tapes. Access time on the latter devices

depends on the address or position of the data.

The basic technology for implementing the memory uses semiconductor integrated

circuits. The sections that follow present some basic facts about the internal structure and

operation of such memories. We then discuss some of the techniques used to increase the

effective speed and size of the memory.

The processor of a computer can usually process instructions and data faster than I they

can be fetched from a reasonably priced memory unit. The memory cycle time, I then, is

the bottleneck in the system. One way to reduce the memory access time is to use a cache

memory. This is a small, fast memory that is inserted between the larger, slower main

memory and the processor. It holds the currently active segments of a program and their

data.

Virtual memory is another important concept related to memory organization. So far, we

have assumed that the addresses generated by the processor directly specify physical

locations in the memory. This may not always be the case. For reasons that will become

apparent later in this chapter, data may be stored in physical memory locations that have

addresses different from those specified by the program. The memory control circuitry

translates the address specified by the program into an address that can be used to access

the physical memory. In such a case, an address generated by the processor is referred to

as a virtual or logical address. The virtual address space is mapped onto the physical

memory where data are actually stored. The mapping function is implemented by a

special memory control circuit, often called the memory management unit. This mapping

function can be changed during program execution according to system requirements.

Virtual memory is used to increase the apparent size of the physical memory. Data are

addressed in a virtual address space that can be as large as the addressing capability of the

processor. But at any given time, only the active portion of this space is mapped onto

locations in the physical memory. The remaining virtual addresses are mapped onto the

bulk storage devices used, which are usually magnetic disks. As the active portion of the

virtual address space changes during program execution, the memory management unit

changes the mapping function and transfers data between the disk and the memory. Thus,

during every memory cycle, an address-processing mechanism determines whether the

addressed information is in the physical memory unit. If it is, then the proper word is

accessed and execution proceeds. If it is not, a page of words containing the desired word

is transferred from the disk to the memory. This page displaces some page in the memory

that is currently inactive.

Because of the time required to move pages between the disk and the memory, there is a

speed degradation if pages are moved frequently. By judiciously choosing which page to

replace in the memory, however, there may be reasonably long periods when the

probability is high that the words accessed by the processor are in the physical memory

unit.



This section has briefly introduced several organizational features of memory systems.

These features have been developed to help provide a computer system with as large and

as fast a memory as can be afforded in relation to the overall cost of the system. We do

not expect the reader to grasp all the ideas or their implications now; more detail is given

later. We introduce these terms together to establish that they are related; a study of their

interrelationships is as important as a detailed study of their individual features.

4.1 Memory Hierarchy We have already stated that an ideal memory would be fast, large, and inexpensive. It is

clear that a very fast memory can be implemented if SRAM chips are used. But these

chips are expensive because their basic cells have six transistors, which include packing a

very large number of cells onto a single chip. Thus, for cost reasons, it is impractical to

build a large memory using SRAM chips. The alternative is to use Dynamic RAM chips,

which have much simpler basic cells and thus are much less expensive. But such

memories are significantly slower.

Although dynamic memory units in the range of hundreds of megabytes can be

implemented at a reasonable cost, the affordable size is still small compared to the

demands of large programs with voluminous data. A solution is provided by using

secondary storage, mainly magnetic disks, to implement large memory spaces. Very large

disks are available at a reasonable price, and they are used extensively in computer

systems. However, they are much slower than the semiconductor memory units. So we

conclude the following: A huge amount of cost-effective storage can be provided by

magnetic disks. A large, yet affordable, main memory can be built with dynamic RAM

technology. This leaves SRAMs to be used in smaller units where speed is of the essence,

such as in cache memories.

All of these different types of memory units are employed effectively in a computer. The

entire computer memory can be viewed as the hierarchy depicted in Figure 5.13. The

fastest access is to data held in processor registers. Therefore, if we consider the registers

to be part of the memory hierarchy, then the processor registers are at the top in terms of

the speed of access. Of course, the registers provide only a minuscule portion of the

required memory.

At the next level of the hierarchy is a relatively small amount of memory that can be

implemented directly on the processor chip. This memory, called a processor cache,

holds copies of instructions and data stored in a much larger memory that is provided

externally. There are often two levels of caches. A primary cache is always located on the

processor chip. This cache is small because it competes for space on the processor chip,

which must implement many other functions. The primary cache is referred to as level 1

(LI) cache. A larger, secondary cache is placed between the primary cache and the rest of

the memory. It is referred to as level 2 (L2) cache. It is usually implemented using SRAM

chips.

Including a primary cache on the processor chip and using a larger, off-chip, secondary

cache is currently the most common way of designing computers. However, other

arrangements can be found in practice. It is possible not to have a cache on the processor

chip at all. Also, it is possible to have both L1 and L2 caches on the processor chip.

The next level in the hierarchy is called the main memory. This rather large memory

is implemented using dynamic memory components, typically in the form of SIMMs,

DIMMs, or RIMMs. The main memory is much larger but significantly slower than the



cache memory. In a typical computer, the access time for the main memory is about ten

times longer than the access time for the L1 cache.

Disk devices provide a huge amount of inexpensive storage. They are very slow

compared to the semiconductor devices used to implement the main memory.

During program execution, the speed of memory access is of utmost importance. The key

to managing the operation of the hierarchical memory system in Figure 5.13 is to bring

the instructions and data that will be used in the near future as close to the processor as

possible. This can be done by using the mechanisms presented in the sections that follow.

4.2 SEMICONDUCTOR RAM MEMORIES

Semiconductor memories are available in a wide range of speeds. Their cycle times range

from 100 ns to less than 10 ns. When first introduced in the late 1960s, they were much

more expensive than the magnetic-core memories they replaced. Because of rapid

advances in VLSI (Very Large Scale Integration) technology, the cost of semiconductor

memories has dropped dramatically. As a result, they are now used almost exclusively in

implementing memories. In this section, we discuss the main characteristics of

semiconductor memories. We start by introducing the way that a number of memory cells

are organized inside a chip.

4.2.1 INTERNAL ORGANIZATION OF MEMORY CHIPS



Memory cells are usually organized in the form of an array, in which each cell is capable

of storing one bit of information. A possible organization is illustrated in Figure 5.2. Each

row of cells constitutes a memory word, and all cells of a row are connected to a common

line referred to as the word line, which is driven by the address decoder on the chip. The

cells in each column are connected to a Sense/Write circuit by two bit lines. The

Sense/Write circuits are connected to the data input/output lines of the chip. During a

Read operation, these circuits sense, or read, the information stored in the cells selected

by a word line and transmit this information to the output data lines. During a Write

operation, the Sense/Write circuits receive input information and store it in the cells of

the selected word.

Figure 5.2 is an example of a very small memory chip consisting of 16 words of 8 bits

each. This is referred to as a 16 x 8 organization. The data input and the data output of

each Sense/Write circuit are connected to a single bidirectional data line that can be

connected to the data bus of a computer. Two control lines, R/W and CS, are provided in

addition to address and data lines. The R/W (Read/Write) input specifies the required

operation, and the CS (Chip Select) input selects a given chip in a multichip memory

system.

The memory circuit in Figure 5.2 stores 128 bits and requires 14 external connections for

address, data, and control lines. Of course, it also needs two lines for power supply and

ground connections. Consider now a slightly larger memory circuit, one that has 1K

(1024) memory cells. This circuit can be organized as a 128 x 8 memory, requiring a total

of 19 external connections. Alternatively, the same number of cells can be organized into

a 1K x 1 format. In this case, a 10-bit address is needed, but there is only one data line,

resulting in 15 external connections. Figure 5.3 shows such an organization. The required

10-bit address is divided into two groups of 5 bits each to form the row and column



addresses for the cell array. A row address selects a row of 32 cells, all of which are

accessed in parallel. However, according to the column address, only one of these cells is

connected to the external data line by the output multiplexer and input demultiplexer.

Commercially available memory chips contain a much larger number of memory cells

than the examples shown in Figures 5.2 and 5.3. We use small examples to make the

figures easy to understand. Large chips have essentially the same organization as Figure

5.3 but use a larger memory cell array and have more external connections. For example,

a 4M-bit chip may have a 512K x 8 organization, in which case 19 address and 8 data

input/output pins are needed. Chips with a capacity of hundreds of megabits are now

available.

STATIC MEMORIES

Memories that consist of circuits capable of retaining their state as long as power is applied are

known as static memories. Figure 5.4 illustrates how a static RAM (SRAM) cell may be

implemented. Two inverters are cross-connected to form a latch. The latch is connected to two hit

lines by transistors T1 and T2. These transistors act as switches that can be opened or closed

under control of the word line. When the word line is at ground level, the transistors are turned

off and the latch retains its state. For example, let us asstfhie that the cell is in state 1 if the logic

value at point X is I and at point Y is 0. This state is maintained as long as the signal on the word

line is at ground level.

Read Operation



In order to read the state of the SRAM cell, the word line is activated to close switches T1 and

T2. If the cell is in state 1, the signal on bit line b is high and the signal on bit line b‘ is low. The

opposite is true if the cell is in state 0. Thus, b and b‘ are complements of each other. Sense/Write

circuits at the end of the bit lines monitor the state of b and b‘ and set the output accordingly.

Write Operation

The state of the cell is set by placing the appropriate value on bit line b and its

complement on b’, and then activating the word line. This forces the cell into the

corresponding state. The required signals on the bit lines are generated by the

Sense/Write circuit.

4.2.2 ASYNCHRONOUS DRAMS

Static RAMs are fast, but they come at a high cost because their cells require several

transistors. Less expensive RAMs can be implemented if simpler cells are used.

However, such cells do not retain their state indefinitely; hence, they are called dynamic

RAMs (DRAMs).

Information is stored in a dynamic memory cell in the form of a charge on a capacitor,

and this charge can be maintained for only tens of milliseconds. Since the cell is required

to store information for a much longer time, its contents must be periodically refreshed

by restoring the capacitor charge to its full value.

An example of a dynamic memory cell that consists of a capacitor, C, and a transistor, T,

is shown in Figure 5.6. In order to store information in this cell, transistor

T is turned on and an appropriate voltage is applied to the bit line. This causes a known

amount of charge to be stored in the capacitor. After the transistor is turned off, the

capacitor begins to discharge. This is caused by the capacitor‘s own leakage resistance

and by the fact that the transistor continues to conduct a tiny amount of current, measured

in picoamperes, after it is turned off. Hence, the information stored in the cell can be

retrieved correctly only if it is read before the charge on the capacitor drops below some

threshold value. During a Read operation, the transistor in a selected cell is turned on. A

sense amplifier connected to the bit line detects whether the charge stored on the



capacitor is above the threshold value. If so, it drives the bit line to a full voltage that

represents logic value 1. This voltage recharges the capacitor to the full charge that

corresponds to logic value 1. If the sense amplifier detects that the charge on the

capacitor is below the threshold value, it pulls the bit line to ground level, which ensures

that the capacitor will have no charge, representing logic value 0. Thus, reading the

contents of the cell automatically refreshes its contents. All cells in a selected row are

read at the same time, which refreshes the contents of the entire row.

A 16-megabit DRAM chip, configured as 2M x 8, is shown in Figure 5.7. The cells are

organized in the form of a 4K x 4K array. The 4096 cells in each row are divided into 512

groups of 8, so that a row can store 512 bytes of data. Therefore, 12 address bits are

needed to select a row. Another 9 bits are needed to specify a group of 8 bits in the

selected row. Thus, a 21-bit address is needed to access a byte in this memory. The high-

order 12 bits and the low-order 9 bits of the address constitute the row and column

addresses of a byte, respectively. To reduce the number of pins needed for external

connections, the row and column addresses are multiplexed on 12 pins. During a Read or

a Write operation, the row address is applied first. It is loaded into the row address latch

in response to a signal pulse on the Row Address Strobe (RAS) input of the chip. Then a

Read operation is initiated, in which all cells on the selected row are read and refreshed.

Shortly after the row address is loaded, the column address is applied to the address pins

and loaded into the column address latch under control of the Column Address Strobe

(CAS) signal. The information in this latch is decoded and the appropriate group of 8

Sense/Write circuits are selected. If the RJW control signal indicates a Read operation,

the output values of the selected circuits are transferred to the data lines, D7_0. For a

Write operation, the information on the D7_0 lines is transferred to the selected circuits.

This information is then used to overwrite the contents of the selected cells in the

corresponding 8 columns. We should note that in commercial DRAM chips, the RAS and



CAS control signals are active low so that they cause the latching of addresses when they

change from high to low. To indicate this fact, these signals are shown on diagrams as

RAS and CAS.

Applying a row address causes all cells on the corresponding row to be read and

refreshed during both Read and Write operations. To ensure that the contents of a DRAM

are maintained, each row of cells must be accessed periodically. A refresh circuit usually

performs this function automatically. Many dynamic memory chips incorporate a refresh

facility within the chips themselves. In this case, the dynamic nature of these memory

chips is almost invisible to the user.

In the DRAM described in this section, the timing of the memory device is controlled

asynchronously. A specialized memory controller circuit provides the necessary control

signals, RAS and CAS, that govern the timing. The processor must take into account the

delay in the response of the memory. Such memories are referred to as asynchronous

DRAMs.

Because of their high density and low cost, DRAMs are widely used in the memory units

of computers. Available chips range in size from 1M to 256M bits, and even larger chips

are being developed. To reduce the number of memory chips needed in a given computer,

a DRAM chip is organized to read or write a number of bits in parallel, as indicated in

Figure 5.7. To provide flexibility in designing memory systems, these chips are

manufactured in different organizations. For example, a 64-Mbit chip may be organized

as 16M x 4, 8M x 8, or 4M x 16.

4.2.3 Synchronous DRAMs More recent developments in memory technology have resulted in DRAMs whose

operation is directly synchronized with a clock signal. Such memories are known as

synchronous DRAMs (SDRAMs). Figure 5.8 indicates the structure of an SDRAM. The

cell array is the same as in asynchronous DRAMs. The address and data connections are

buffered by means of registers. We should particularly note that the output of each

refresh the contents of the cells. Data held in the latches that correspond to the selected in

which column(s) are transferred into the data output register, thus becoming available on

the feature is data output pins.

SDRAMs have several different modes of operation, which can be selected by

writing control information into a mode register. For example, burst operations of

different lengths can be specified. The burst operations use the block transfer capability

described above as the fast page mode feature. In SDRAMs, it is not necessary to

provide externally generated pulses on the CAS line to select successive columns. The

necessary control signals are provided internally using a column counter and the clock Is

whose signal. New data can be placed on the data lines in each clock cycle. All actions

are triggered by the rising edge of the clock.



STRUCTURE OF LARGER MEMORIES

We have discussed the basic organization of memory circuits as they may be

implemented on a single chip. Next, we should examine how memory chips may be

connected to form a much larger memory.

Static Memory Systems

Consider a memory consisting of 2M (2,097,152) words of 32 bits each. Figure 5.10

shows how we can implement this memory using 512K x 8 static memory chips. Each

column in the figure consists of four chips, which implement one byte position. Four of

these sets provide the required 2M x 32 memory. Each chip has a control input called

Chip Select. When this input is set to 1, it enables the chip to accept data from or to place

data on its data lines. The data output for each chip is of the three-state type. Only the

selected chip places data on the data output line, while all other outputs are in the high-

impedance state. Twenty one address bits are needed to select a 32-bit word in this

memory. The high-order 2 bits of the address are decoded to determine which of the four

Chip Select control signals should be activated, and the remaining 19 address bits are

used to access specific byte locations inside each chip of the selected row. The RJW

inputs of all chips are tied together to provide a common Read/Write control (not shown

in the figure).



Dynamic Memory Systems

The organization of large dynamic memory systems is essentially the same as the

memory shown in Figure 5.10. However, physical implementation is often done more

conveniently in the form of memory modules.

Modern computers use very large memories; even a small personal computer is likely to

have at least 32M bytes of memory. Typical workstations have at least I 28M bytes of

memory. A large memory leads to better performance because more of the programs and

data used in processing can be held in the memory, thus reducing the frequency of

accessing the information in secondary storage. However, if a large memory is built by

placing DRAM chips directly on the main system printed-circuit board that contains the

processor, often referred to as a motherboard, it will occupy an unacceptably large

amount of space on the board. Also, it is awkward to provide for future



Modern computers use very large memories; even a small personal computer is likely to

have at least 32M bytes of memory. Typical workstations have at least I 28M bytes of

memory. A large memory leads to better performance because more of the programs and

data used in processing can be held in the memory, thus reducing the frequency of

accessing the information in secondary storage. However, if a large memory is built by

placing DRAM chips directly on the main system printed-circuit board that contains the

processor, often referred to as a motherboard, it will occupy an unacceptably large

amount of space on the board. Also, it is awkward to provide for future expansion of the

memory, because space must be allocated and wiring provided for the maximum

expected size. These packaging considerations have led to the development of larger

memory units known as SIMMs (Single In-line Memory Modules) and DIMMs (Dual In-

line Memory Modules). Such a module is an assembly of several memory chips on a

separate small board that plugs vertically into a single socket on the motherboard. SIMMs

and DIMMs of different sizes are designed to use the same size socket. For example, 4M



x 32, 16M x 32, and 32M x 32 bit DIMMs all use the same 100-pin socket. Similarly, 8M

x 64, 16M x 64, 32M x 64, and 64M x 72 DIMMs use a 168-pin socket. Such modules

occupy a smaller amount of space on a motherboard, and they allow easy expansion by

replacement if a larger module uses the same socket as the smaller one.

4.3 READ-ONLY MEMORIES (ROMs)

Both SRAM and DRAM chips are volatile, which means that they lose the stored

information if power is turned off. There are many applications that need memory

devices which retain the stored information if power is turned off. For example, in

a typical computer a hard disk drive is used to store a large amount of information,

including the operating system software. When a computer is turned on, the operating

system software has to be loaded from the disk into the memory. This requires execution

of a program that ―boots‖ the operating system. Since the boot program is quite large,

most of it is stored on the disk. The processor must execute some instructions that load

the boot program into the memory. If the entire memory consisted of only volatile

memory chips, the processor would have no means of accessing these instructions. A

practical solution is to provide a small amount of nonvolatile memory that holds the

instructions whose execution results in loading the boot program from the disk.

Nonvolatile memory is used extensively in embedded systems. Such systems typically do

not use disk storage devices. Their programs are stored in nonvolatile semiconductor

memory devices.

Different types of nonvolatile memory have been developed. Generally, the contents of

such memory can be read as if they were SRAM or DRAM memories. But, a special

writing process is needed to place the information into this memory. Since its normal

operation involves only reading of stored data, a memory of this type is called read-only

memory (ROM).

4.3.1 ROM

Figure 5.12 shows a possible configuration for a ROM cell. A logic value 0 is stored in

the cell if the transistor is connected to ground at point P; otherwise, a I is stored. The bit

line is connected through a resistor to the power supply. To read the state of the cell, the

word line is activated. Thus, the transistor switch is closed and the voltage on the bit line

drops to near zero if there is a connection between the transistor and ground. If there is no

connection to ground, the bit line remains at the high voltage, indicating a 1. A sense

circuit at the end of the bit line generates the proper output value. Data are written into a

ROM when it is manufactured.



4.3.2 PROM

Some ROM designs allow the data to be loaded by the user, thus providing a

programmable ROM (PROM). Programmability is achieved by inserting a fuse at point P

in Figure 5.12. Before it is programmed, the memory contains all Os. The user can insert

Is at the required locations by burning out the fuses at these locations using high-current

pulses. Of course, this process is irreversible. PROMs provide flexibility and convenience

not available with ROMs. The latter are economically attractive for storing fixed

programs and data when high volumes of ROMs are produced. However, the cost of

preparing the masks needed for storing a particular information pattern in ROMs makes

them very expensive when only a small number are required. In this case, PROMs

provide a faster and considerably less expensive approach because they can be

programmed directly by the user.

4.3.3 EPROM

Another type of ROM chip allows the stored data to be erased and new data to be loaded.

Such an erasable, reprogrammable ROM is usually called an EPROM. It provides

considerable flexibility during the development phase of digital systems. Since EPROMs

are capable of retaining stored information for a long time, they can be used in place of

ROMs while software is being developed. In this way, memory changes and [he updates

can be easily made.

An EPROM cell has a structure similar to the ROM cell in Figure 5.12. In an EPROM

cell, however, the connection to ground is always made at point P and a [.If special

transistor is used, which has the ability to function either as a normal transistor g a or as a

disabled transistor that is always turned off. This transistor can be programmed are to

behave as a permanently open switch, by injecting charge into it that becomes trapped

inside. Thus, an EPROM cell can be used to construct a memory in the same way as the

previously discussed ROM cell.

The important advantage of EPROM chips is that their contents can be erased and

reprogrammed. Erasure requires dissipating the charges trapped in the transistors of

memory cells; this can be done by exposing the chip to ultraviolet light. For this reason,

EPROM chips are mounted in packages that have transparent windows.

4.3.4 EEPROM



A significant disadvantage of EPROMs is that a chip must be physically removed from

the circuit for reprogramming and that its entire contents are erased by the ultraviolet

light. It is possible to implement another version of erasable PROMs that can be both

programmed and erased electrically. Such chips, called EEPROMs, do not have to be

removed for erasure. Moreover, it is possible to erase the cell contents selectively. The

only disadvantage of EEPROMs is that different voltages are needed for erasing, writing,

and reading the stored data.

FLASH MEMORY

An approach similar to EEPROM technology has more recently given rise to flash

memory devices. A flash cell is based on a single transistor controlled by trapped charge,

just like an EEPROM cell. While similar in some respects, there are also substantial

differences between flash and EEPROM devices. In EEPROM it is possible to read and

write the contents of a single cell. In a flash device it is possible to read the contents

of a single cell, but it is only possible to write an entire block of cells. Prior to writing,

the previous contents of the block are erased. Flash devices have greater density, which

leads to higher capacity and a lower cost per bit. They require a single power supply

voltage, and consume less power in their operation.

The low power consumption of flash memory makes it attractive for use in portable

equipment that is battery driven. Typical applications include hand-held computers, cell

phones, digital cameras, and MP3 music players. In hand-held computers and cell

phones, flash memory holds the software needed to operate the equipment, thus obviating

the need for a disk drive. In digital cameras, flash memory is used to store picture image

data. In MP3 players, flash memory stores the data that represent sound.

Cell phones, digital cameras, and MP3 players are good examples of embedded systems.

Single flash chips do not provide sufficient storage capacity for the applications

mentioned above. Larger memory modules consisting of a number of chips are needed.

There are two popular choices for the implementation of such modules: flash cards and

flash drives.

Flash Cards

One way of constructing a larger module is to mount flash chips on a small card.

Such flash cards have a standard interface that makes them usable in a variety of

products. A card is simply plugged into a conveniently accessible slot. Flash cards

come in a variety of memory sizes. Typical sizes are 8, 32, and 64 Mbytes. A minute

of music can be stored in about 1 Mbyte of memory, using the MP3 encoding format.

Hence, a 64-MB flash card can store an hour of music.

Flash Drives

Larger flash memory modules have been developed to replace hard disk drives.

These flash drives are designed to fully emulate the hard disks, to the point that

can be fitted into standard disk drive bays. However, the storage capacity of flash drives

is significantly lower. Currently, the capacity of flash drives is less than one gigabyte. In

contrast, hard disks can store many gigabytes.

The fact that flash drives are solid state electronic devices that have no movable us parts

provides some important advantages. They have shorter seek and access times, which

results in faster response. They have lower power consumption, which makes them

attractive for battery driven applications, and they are also insensitive to vibration.

The disadvantages of flash drives hard disk drives are their smaller capacity and higher

cost per bit. Disks provide an extremely low cost per bit. Another disadvantage is that the



flash memory will deteriorate after it has been written a number of times. Fortunately,

this number is high, typically at least one million times.

4.4 MEMORY SYSTEM CONSIDERATIONS

The choice of a RAM chip for a given application depends on several factors. Foremost

among these factors are the cost, speed, power dissipation, and size of the chip.

Static RAMs are generally used only when very fast operation is the primary

requirement. Their cost and size are adversely affected by the complexity of the circuit

that realizes the basic cell. They are used mostly in cache memories. Dynamic RAMs are

the predominant choice for implementing computer main memories. The high densities

achievable in these chips make large memories economically feasible.

Memory Controller

To reduce the number of pins, the dynamic memory chips use multiplexed address inputs.

The address is divided into two parts. The high-order address bits, which select a row in

the cell array, are provided first and latched into the memory chip under control of the

RAS signal. Then, the low-order address bits, which select a column, are provided on the

same address pins and latched using the CAS signal.

A typical processor issues all bits of an address at the same time. The required

multiplexing of address bits is usually performed by a memory controller circuit, which is

interposed between the processor and the dynamic memory as shown in Figure 5.11.

The controller accepts a complete address and the R/W signal from the processor, under

control of a Request signal which indicates that a memory access operation is needed.

The controller then forwards the row and column portions of the address to the memory

and generates the RAS and CAS signals. Thus, the controller provides the RAS-CAS

timing, in addition to its address multiplexing function. It also sends the R/W and CS

signals to the memory. The CS signal is usually active low, hence it is shown as CS in

Figure 5.11. Data lines are connected directly between the processor and the memory.

Note that the clock signal is needed in SDRAM chips.

When used with DRAM chips, which do not have self-refreshing capability, the memory

controller has to provide all the information needed to control the refreshing process. It

contains a refresh counter that provides successive row addresses. Its function is to cause

the refreshing of all rows to be done within the period specified for a particular device.

Refresh Overhead



All dynamic memories have to be refreshed. In older DRAMs, a typical period for

refreshing all rows was 16 ms. In typical SDRAMs, a typical period is 64 ms.

Consider an SDRAM whose cells are arranged in 8K (=8 192) rows. Suppose that it takes

four clock cycles to access (read) each row. Then, it takes 8192 x 4 = 32,768 cycles to

refresh all rows. At a clock rate of 133 MHz, the time needed to refresh all rows is

32,768/(l33 x 106) = 246 x 106 seconds.Thus,therefreshingprocessoccupies0.246 ms in

each 64-ms time interval. Therefore, the refresh overhead is 0.246/64 0.0038, which is

less than 0.4 percent of the total time available for accessing the memory.

4.5 ASSOCIATIVE MEMORY

Associative memory (content-addressable memory, CAM) A memory that is capable

of determining whether a given datum – the search word – is contained in one of its

addresses or locations. This may be accomplished by a number of mechanisms. In some

cases parallel combinational logic is applied at each word in the memory and a test is

made simultaneously for coincidence with the search word. In other cases the search

word and all of the words in the memory are shifted serially in synchronism; a single bit

of the search word is then compared to the same bit of all of the memory words using as

many single-bit coincidence circuits as there are words in the memory. Amplifications of

the associative memory technique allow for masking the search word or requiring only a

―close‖ match as opposed to an exact match. Small parallel associative memories are

used in cache memory and virtual memory mapping applications.

Since parallel operations on many words are expensive (in hardware), a variety of

stratagems are used to approximate associative memory operation without actually

carrying out the full test described here. One of these uses hashing to generate a ―best

guess‖ for a conventional address followed by a test of the contents of that address.

Some associative memories have been built to be accessed conventionally (by words in

parallel) and as serial comparison associative memories; these have been called

orthogonal memories. See also associative addressing, associative processor.

Associative memory may refer to:

a type of memory closely associated with neural networks; such as Bidirectional

Associative Memory, Autoassociative memory and Hopfield net.

a type of computer memory; see Content-addressable memory.

an aspect of human memory; see Transderivational search.

4.6 VIRTUAL MEMORIES

In most modern computer systems, the physical main memory is not as large as the

address space spanned by an address issued by the processor. For example, a processor

that issues 32-bit addresses has an addressable space of 4G bytes. The size of the main

memory in a typical computer ranges from a few hundred megabytes to 1G bytes. When

a program does not completely fit into the main memory, the parts of it not currently

being executed are stored on secondary storage devices, such as magnetic disks. Of

course, all parts of a program that are eventually executed are first brought into the

main memory. When a new segment of a program is to be moved into a full memory, it

http://www.encyclopedia.com/doc/1O11-masking.html

http://www.encyclopedia.com/doc/1O11-cache.html

http://www.encyclopedia.com/doc/1O11-virtualmemory.html

http://www.encyclopedia.com/doc/1O11-hashing.html

http://www.encyclopedia.com/doc/1O11-associativeaddressing.html

http://www.encyclopedia.com/doc/1O11-associativeprocessor.html

http://en.wikipedia.org/wiki/Neural_networks

http://en.wikipedia.org/wiki/Bidirectional_Associative_Memory

http://en.wikipedia.org/wiki/Bidirectional_Associative_Memory

http://en.wikipedia.org/wiki/Autoassociative_memory

http://en.wikipedia.org/wiki/Hopfield_net

http://en.wikipedia.org/wiki/Content-addressable_memory

http://en.wikipedia.org/wiki/Transderivational_search



must replace another segment already in the memory. In modern computers, the operating

system moves programs and data automatically between the main memory and secondary

storage. Thus, the application programmer does not need to be aware of limitations

imposed by the available main memory.

Techniques that automatically move program and data blocks into the physical main

memory when they are required for execution are called virtual-memory techniques.

Programs, and hence the processor, reference an instruction and data space that is

independent of the available physical main memory space. The binary addresses that the

processor issues for either instructions or data are called virtual or logical addresses.

These addresses are translated into physical addresses by a combination of hardware and

software components. If a virtual address refers to a part of the program or data space that

is currently in the physical memory, then the contents of the appropriate location in the

main memory are accessed immediately. On the other hand, if the referenced address is

not in the main memory, its contents must be brought into a suitable location in the

memory before they can be used.

Figure 5.26 shows a typical organization that implements virtual memory. A special

hardware unit, called the Memory Management Unit (MMU), translates virtual addresses

into physical addresses. When the desired data (or instructions) are in the main memory,

these data are fetched as described in our presentation of the cache Y mechanism. If the

data are not in the main memory, the MMU causes the operating system to bring the data

into the memory from the disk. Transfer of data between the disk and the main memory is

performed using the DMA scheme.



ADDRESS TRANSLATION

A simple method for translating virtual addresses into physical addresses is to assume

that all programs and data are composed of fixed-length units called pages, each of which

consists of a block of words that occupy contiguous locations in the main memory. Pages

commonly range from 2K to 16K bytes in length. They constitute the basic unit of

information that is moved between the main memory and the disk whenever the

translation mechanism determines that a move is required. Pages should not be too small,

because the access time of a magnetic disk is much longer (several milliseconds) than the

access time of the main memory. The reason for this is that it takes a considerable

amount of time to locate the data on the disk, but once located, the data can be transferred

at a rate of several megabytes per second. On the other hand, if pages are too large it is

possible that a substantial portion of a page may not be used, yet this unnecessary data

will occupy valuable space in the main memory.

This discussion clearly parallels the concepts introduced in Section 5.5 on cache memory.

The cache bridges the speed gap between the processor and the main memory and is

implemented in hardware. The virtual-memory mechanism bridges the size and speed

gaps between the main memory and secondary storage and is usually implemented in part

by software techniques. Conceptually, cache techniques and virtual- memory techniques

are very similar. They differ mainly in the details of their implementation.

A virtual-memory address translation method based on the concept of fixed-length pages

is shown schematically in Figure 5.27. Each virtual address generated by the processor,

whether it is for an instruction fetch or an operand fetch/store operation, is interpreted as

a virtual page number (high-order bits) followed by an offset (low-order bits) that

specifies the location of a particular byte (or word) within a page. Information about the

main memory location of each page is kept in a page table. This information includes the

main memory address where the page is stored and the current status of the page. An area

in the main memory that can hold one page is called a page frame. The starting address of

the page table is kept in a page table base register. By adding the virtual page number to

the contents of this register, the address of the corresponding entry in the page table is

obtained. The contents of this location give the starting address of the page if that page

currently resides in the main memory.

Each entry in the page table also includes some control bits that describe the status of the

page while it is in the main memory. One bit indicates the validity of the page, that is,

whether the page is actually loaded in the main memory. This bit allows the operating

system to invalidate the page without actually removing it. Another bit indicates whether

the page has been modified during its residency in the memory. As in cache memories,

this information is needed to determine whether the page should be written back to the

disk before it is removed from the main memory to make room for another page. Other

control bits indicate various restrictions that may be imposed on accessing the page. For

example, a program may be given full read and write permission, or it may be restricted

to read accesses only.



The page table information is used by the MMU for every read and write access, so

ideally, the page table should be situated within the MMU. Unfortunately, the page table

may be rather large, and since the MMU is normally implemented as part of the processor

chip (along with the primary cache), it is impossible to include a complete page table on

this chip. Therefore, the page table is kept in the main memory. However, a copy of a

small portion of the page table can be accommodated within the MMU.

This portion consists of the page table entries that correspond to the most recently

accessed pages. A small cache, usually called the Translation Look aside Buffer (TLB) is

incorporated into the MMU for this purpose. The operation of the TLB with respect to the

page table in the main memory is essentially the same as the operation we have discussed

in conjunction with the cache memory. In addition to the information that constitutes a

page table entry, the TLB must also include the virtual address of the entry. Figure 5.28

shows a possible organization of a TLB where the associative- mapping technique is

used. Set-associative mapped TLBs are also found in commercial products.



An essential requirement is that the contents of the TLB be coherent with the contents of

page tables in the memory. When the operating system changes the contents of page

tables, it must simultaneously invalidate the corresponding entries in the TLB. One of the

control bits in the TLB is provided for this purpose. When an entry is invalidated, the

TLB will acquire the new information as part of the MMU‘s normal response to access

misses.

Address translation proceeds as follows. Given a virtual address, the MMU looks in the

TLB for the referenced page. If the page table entry for this page is found in the TLB, the

physical address is obtained immediately. If there is a miss in the TLB, then the required

entry is obtained from the page table in the main memory and the TLB is updated.

When a program generates an access request to a page that is not in the main memory, a

page fault is said to have occurred. The whole page must be brought from the disk into

the memory before access can proceed. When it detects a page fault, the MMU asks the

operating system to intervene by raising an exception (interrupt). Processing of the active

task is interrupted, and control is transferred to the operating system. The operating



system then copies the requested page from the disk into the main memory and returns

control to the interrupted task. Because a long delay occurs while the page transfer takes

place, the operating system may suspend execution of the task that caused the page fault

and begin execution of another task whose pages are in the main memory.

It is essential to ensure that the interrupted task can continue correctly when it resumes

execution. A page fault occurs when some instruction accesses a memory operand that is

not in the main memory, resulting in an interruption before the execution of this

instruction is completed. Hence, when the task resumes, either the execution of the

interrupted instruction must continue from the point of interruption, or the instruction

must be restarted. The design of a particular processor dictates which of these options

should be used.

If a new page is brought from the disk when the main memory is full, it must replace one

of the resident pages. The problem of choosing which page to remove is just as critical

here as it is in a cache, and the idea that programs spend most of their time in a few

localized areas also applies. Because main memories are considerably larger than cache

memories, it should be possible to keep relatively larger portions of a program in the

main memory. This will reduce the frequency of transfers to and from the disk. Concepts

similar to the LRU replacement algorithm can be applied to page replacement, and the

control bits in the page table entries can indicate usage. One simple scheme is based on a

control bit that is set to 1 whenever the corresponding page is referenced (accessed). The

operating system occasionally clears this bit in all page table entries, thus providing a

simple way of determining which pages have not been used recently.

A modified page has to be written back to the disk before it is removed from the main

memory. It is important to note that the write-through protocol, which is useful in the

framework of cache memories, is not suitable for virtual memory. The access time of the

disk is so long that it does not make sense to access it frequently to write small amounts

of data.

The address translation process in the MMU requires some time to perform, mostly

dependent on the time needed to look up entries in the TLB. Because of locality of

reference, it is likely that many successive translations involve addresses on the same

page. This is particularly evident in fetching instructions. Thus, we can reduce the

average translation time by including one or more special registers that retain the virtual

page number and the physical page frame of the most recently performed translations.

The information in these registers can be accessed more quickly than the TLB.

4.7 CACHE MEMORIES

The speed of the main memory is very low in comparison with the speed of modern

processors. For good performance, the processor cannot spend much of its time waiting

to access instructions and data in main memory. Hence, it is important to devise a scheme

that reduces the time needed to access the necessary information. Since the speed of the

main memory unit is limited by electronic and packaging constraints, the solution must

be sought in a different architectural arrangement. An efficient solution is to use a fast

cache memory which essentially makes the main memory appear to the processor to be

faster than it really is.

The effectiveness of the cache mechanism is based on a property of computer programs

called locality of reference. Analysis of programs shows that most of their execution time

is spent on routines in which many instructions are executed repeatedly. These

instructions may constitute a simple loop, nested loops, or a few procedures that



repeatedly call each other. The actual detailed pattern of instruction sequencing is not

important — the point is that many instructions in localized areas of the program are

executed repeatedly during some time period, and the remainder of the program is

accessed relatively infrequently. This is referred to as locality of reference. it manifests

itself in two ways: temporal and spatial. The first means that a recently executed

instruction is likely to be executed again very soon. The spatial aspect means that

instructions in close proximity to a recently executed instruction (with respect to the

instructions‘ addresses) are also likely to be executed soon.

If the active segments of a program can be placed in a fast cache memory, then the total

execution time can be reduced significantly. Conceptually, operation of a cache memory

is very simple. The memory control circuitry is designed to take advantage of the

property of locality of reference. The temporal aspect of the locality of reference suggests

that whenever an information item (instruction or data) is first needed, this item should be

brought into the cache where it will hopefully remain until it is needed again. The spatial

aspect suggests that instead of fetching just one item from the main memory to the cache,

it is useful to fetch several items that reside at adjacent addresses as well. We will use the

term block to refer to a set of contiguous address locations of some size. Another term

that is often used to refer to a cache block is cache line.

Consider the simple arrangement in Figure 5.14. When a Read request is received from

the processor, the contents of a block of memory words containing the location

e very slow specified are transferred into the cache one word at a time. Subsequently,

when the program references any of the locations in this block, the desired contents are

read directly from the cache. Usually, the cache memory can store a reasonable number

of blocks at any given time, but this number is small compared to the total number of

blocks in the main memory. The correspondence between the main memory blocks and

those in the cache is specified by a mapping function. When the cache is full and a

memory word (instruction or data) that is not in the cache is referenced, the cache control

hardware must decide which block should be removed‘ to create space for the new block

that contains the referenced word. The collection of rules for making this decision

constitutes the replacement algorithm.

The processor does not need to know explicitly about the existence of the cache. It simply

issues Read and Write requests using addresses that refer to locations in the memory. The

cache control circuitry determines whether the requested word currently exists in the

cache. If it does, the Read or Write operation is performed on the appropriate cache

location. In this case, a read or write hit is said to have occurred. In a Read operation, the

main memory is not involved. For a Write operation, the system can proceed in two

ways. In the first technique, called the write-through protocol, the cache location and the



main memory location are updated simultaneously. The second technique is to update

only the cache location and to mark it as updated with an associated flag bit, often called

the dirty or modified bit. The main memory location of the word is updated later, when

the block containing this marked word is to be removed from the cache to make room for

a new block. This technique is known as the write- back, or copy-back, protocol. The

write-through protocol is simpler, but it results in unnecessary Write operations in the

main memory when a given cache word is updated several times during its cache

residency. Note that the write-back protocol may also result in unnecessary Write

operations because when a cache block is written back to the memory all words of the

block are written back, even if only a single word has been changed while the block was

in the cache.

When the addressed word in a Read operation is not in the cache, a read miss occurs. The

block of words that contains the requested word is copied from the main memory into the

cache. After the entire block is loaded into the cache, the particular word requested is

forwarded to the processor. Alternatively, this word may be sent to the processor as soon

as it is read from the main memory. The latter approach, which is called load-through, or

early restart, reduces the processor‘s waiting period somewhat, but at the expense of

more complex circuitry.

During a Write operation, if the addressed word is not in the cache, a write miss occurs.

Then, if the write-through protocol is used, the information is written directly into the

main memory. In the case of the write-back protocol, the block containing the addressed

word is first brought into the cache, and then the desired word in the cache is overwritten

with the new information.

4.8 MAPPING FUNCTIONS

To discuss possible methods for specifying where memory blocks are placed in the cache,

we use a specific small example. Consider a cache consisting of 128 blocks of 16 words

each, for a total of 2048 (2K) words, and assume that the main memory is addressable by

a 16-bit address. The main memory has 64K words, which we will view as 4K blocks of

16 words each. For simplicity, we will assume that consecutive addresses refer to

consecutive words.

4.8.1 Direct Mapping

The simplest way to determine cache locations in which to store memory blocks

is the direct-mapping technique. In this technique, block j of the main memory maps onto

block j modulo 128 of the cache, as depicted in Figure 5.15. Thus, whenever one of the

main memory blocks 0, 128, 256, ... is loaded in the cache, it is stored in cache block 0.

Blocks 1, 129, 257, ... are stored in cache block 1, and so on. Since more than one

memory block is mapped onto a given cache block position, contention may arise for that

position even when the cache is not full. For example, instructions of a program may start

in block 1 and continue in block 129, possibly after a branch. As this program is

executed, both of these blocks must be transferred to the block-1 position in the cache.

Contention is resolved by allowing the new block to overwrite the currently resident

block. In this case, the replacement algorithm is trivial.

Placement of a block in the cache is determined from the memory address. The memory

address can be divided into three fields, as shown in Figure 5.15. The low-order 4 bits

select one of 16 words in a block. When a new block enters the cache, the 7-bit cache

block field determines the cache position in which this block must be stored. The high-



order 5 bits of the memory address of the block are stored in 5 tag bits associated with its

location in the cache. They identify which of the 32 blocks that are mapped into this

cache position are currently resident in the cache. As execution proceeds, the 7-bit cache

block field of each address generated by the processor points to a particular block

location in the cache. The high-order 5 bits of the address are compared with the tag bits

associated with that cache location. If they match, then the desired word is in that block

of the cache. If there is no match, then the block containing the required word must first

be read from the main memory and loaded into the cache. The direct-mapping technique

is easy to implement, but it is not very flexible.

4.8.2 Associative Mapping

Figure 5.16 shows a much more flexible mapping method, in which a main memory

block can be placed into any cache block position. In this case, 12 tag bits are required to



identify a memory block when it is resident in the cache. The tag bits of an address

received from the processor are compared to the tag bits of each block of the cache to see

if the desired block is present. This is called the associative-mapping technique. It gives

complete freedom in choosing the cache location in which to place the memory block.

Thus, the space in the cache can be used more efficiently. A new block that has to be

brought into the cache has to replace (eject) an existing block only if the cache is full. In

this case, we need an algorithm to select the block to be replaced. Many replacement

algorithms are possible. The cost of an associative cache is higher than the cost of a

direct-mapped cache because of the need to search all 128 tag patterns to determine

whether a given block is in the cache. A search of this kind is called an associative

search. For performance reasons, the tags must be searched in parallel.

4.8.3 Set-Associative Mapping

A combination of the direct- and associative-mapping techniques can be used. Blocks of

the cache are grouped into sets, and the mapping allows a block of the main memory to

reside in any block of a specific set. Hence, the contention problem of the direct method

is eased by having a few choices for block placement. At the same time, the hardware

cost is reduced by decreasing the size of the associative search. An example of this set-

associative-mapping technique is shown in Figure 5.17 for a cache with two blocks per



set. In this case, memory blocks 0, 64, 128 4032 map into cache set 0, and they can

occupy either of the two block positions within this set. Having 64 sets means that the 6-

bit set field of the address determines which set of the cache might contain the desired

block. The tag field of the address must then be associatively compared to the tags of the

two blocks of the set to check if the desired block is present. This two-way associative

search is simple to implement.

The number of blocks per set is a parameter that can be selected to suit the requirements

of a particular computer. For the main memory and cache sizes in Figure 5.17, four

blocks per set can be accommodated by a 5-bit set field, eight blocks per set by a 4-bit set

field, and so on. The extreme condition of 128 blocks per set requires no set bits and

corresponds to the fully associative technique, with 12 tag bits. The other extreme of one

block per set is the direct-mapping method. A cache that has k blocks per set is referred to

as a k-way set-associative cache.



One more control bit, called the valid bit, must be provided for each block. This bit

indicates whether the block contains valid data. It should not be confused with the

modified, or dirty, bit mentioned earlier. The dirty bit, which indicates whether the block

has been modified during its cache residency, is needed only in systems that do not use

the write-through method. The valid bits are all set to 0 when power is initially applied to

the system or when the main memory is loaded with new programs and data from the

disk. Transfers from the disk to the main memory are carried out by a DMA mechanism.

Normally, they bypass the cache for both cost and performance reasons. The valid bit of a

particular cache block is set to 1 the first time this block is loaded from the main memory.

Whenever a main memory block is updated by a source that bypasses the cache, a check

is made to determine whether the block being loaded is currently in the cache. If it is, its

valid bit is cleared to 0. This ensures that stale data will not exist in the cache.

A similar difficulty arises when a DMA transfer is made from the main memory

to the disk, and the cache uses the write-back protocol. In this case, the data in the

memory might not reflect the changes that may have been made in the cached copy.

One solution to this problem is to flush the cache by forcing the dirty data to be written

back to the memory before the DMA transfer takes place. The operating system can do

this easily, and it does not affect performance greatly, because such disk transfers do not

occur often. This need to ensure that two different entities (the processor and DMA

subsystems in this case) use the same copies of data is referred to as a cache-coherence

problem.

4.9 REPLACEMENT ALGORITHMS

In a direct-mapped cache, the position of each block is predetermined; hence, no

replacement strategy exists. In associative and set-associative caches there exists some

flexibility. When a new block is to be brought into the cache and all the positions that it

may occupy are full, the cache controller must decide which of the old blocks to

overwrite. This is an important issue because the decision can be a strong determining

factor in system performance. In general, the objective is to keep blocks in the cache that

are likely to be referenced in the near future. However, it is not easy to determine which

blocks are about to be referenced. The property of locality of reference in programs gives

a clue to a reasonable strategy. Because programs usually stay in localized areas for

reasonable periods of time, there is a high probability that the blocks that have been

referenced recently will be referenced again soon. Therefore, when a block is to be

overwritten, it is sensible to overwrite the one that has gone the longest time without

being referenced. This block is called the least recently used (LRU) block, and the

technique is called the LRU replacement algorithm.

To use the LRU algorithm, the cache controller must track references to all blocks as

computation proceeds. Suppose it is required to track the LRU block of a four-block set

in a set-associative cache. A 2-bit counter can be used for each block. When a hit occurs,

the counter of the block that is referenced is set to 0. Counters with values originally

lower than the referenced one are incremented by one, and all others remain unchanged.

When a miss occurs and the set is not full, the counter associated with the new block

loaded from the main memory is set to 0, and the values of all other counters are

increased by one. When a miss occurs and the set is full, the block with the counter value

3 is removed, the new block is put in its place, and its counter is set to 0. The other three

block counters are incremented by one. It can be easily verified that the counter values of

occupied blocks are always distinct.



The LRU algorithm has been used extensively. Although it performs well for many

access patterns, it can lead to poor performance in some cases. For example, it produces

disappointing results when accesses are made to sequential elements of an array that is

hat slightly too large to fit into the cache (see Section 5.5.3 and Problem 5.12).

Performance I is of the LRU algorithm can be improved by introducing a small amount

of randomness in deciding which block to replace.

Several other replacement algorithms are also used in practice. An intuitively reasonable

rule would be to remove the ―oldest‖ block from a full set when a new block he must be

brought in. However, because this algorithm does not take into account the recent pattern

of access to blocks in the cache, it is generally not as effective as the LRU algorithm in

choosing the best blocks to remove. The simplest algorithm is to randomly choose the

block to be overwritten. Interestingly enough, this simple algorithm has been found to be

quite effective in practice.

4.10 MEMORY INTERLEAVING

If the main memory of a computer is structured as a collection of physically separate

modules, each with its own address buffer register (ABR) and data buffer register (DBR),

memory access operations may proceed in more than one module at the same d in time.

Thus, the aggregate rate of transmission of words to and from the main memory system

can be increased.

How individual addresses are distributed over the modules is critical in determining the

average number of modules that can be kept busy as computations proceed. Two methods

of address layout are indicated in Figure 5.25. In the first case, the memory address

generated by the processor is decoded as shown in Figure 5.25a. The high- order k bits

name one of n modules, and the low-order m bits name a particular word in that module,

When consecutive locations are accessed, as happens when a block of irate data is

transferred to a cache, only one module is involved. At the same time, however, devices

with direct memory access (DMA) ability may be accessing information in other memory

modules.

The second and more effective way to address the modules is shown in Figure 5.25b. It

is called memory interleaving. The low-order k bits of the memory address select a

module, and the high-order m bits name a location within that module. In this way,

consecutive addresses are located in successive modules. Thus, any component of the

system that generates requests for access to consecutive memory locations can keep

several modules busy at any one time. This results in both faster accesses to a block of

data and higher average utilization of the memory system as a whole. To implement the

interleaved structure, there must be 2‘ modules; otherwise, there will be gaps of

nonexistent locations in the memory address space.



HIT RATE AND MIss PENALTY

An excellent indicator of the effectiveness of a particular implementation of the memory

hierarchy is the success rate in accessing information at various levels of the hierarchy.

Recall that a successful access to data in a cache is called a hit. The number of hits stated

as a fraction of all attempted accesses is called the hit rate, and the miss rate is the

number of misses stated as a fraction of attempted accesses.

Ideally, the entire memory hierarchy would appear to the processor as a single memory

unit that has the access time of a cache on the processor chip and the size of a magnetic

disk. How close we get to this ideal depends largely on the hit rate at different levels of

the hierarchy. High hit rates, well over 0.9, are essential for high-performance computers.

Performance is adversely affected by the actions that must be taken after a miss. The

extra time needed to bring the desired information into the cache is called the miss

penalty. This penalty is ultimately reflected in the time that the processor is stalled

because the required instructions or data are not available for execution. In general, the

miss penalty is the time needed to bring a block of data from a slower unit in the memory

hierarchy to a faster unit. The miss penalty is reduced if efficient mechanisms for

transferring data between the various units of the hierarchy are implemented. The



previous section shows how an interleaved memory can reduce the miss penalty

substantially.



Input –Output

5.1 Printers

In computing, a printer is a peripheral which produces a hard copy (permanent human-

readable text and/or graphics) of documents stored in electronic form, usually on physical

print media such as paper or transparencies. Many printers are primarily used as local

peripherals, and are attached by a printer cable or, in most newer printers, a USB cable to

a computer which serves as a document source. Some printers, commonly known as

network printers, have built-in network interfaces (typically wireless or Ethernet), and

can serve as a hardcopy device for any user on the network. Individual printers are often

designed to support both local and network connected users at the same time.

In addition, a few modern printers can directly interface to electronic media such as

memory sticks or memory cards, or to image capture devices such as digital cameras,

scanners; some printers are combined with a scanners and/or fax machines in a single

unit, and can function as photocopiers. Printers that include non-printing features are

sometimes called Multifunction Printers (MFP), Multi-Function Devices (MFD), or All-

In-One (AIO) printers. Most MFPs include printing, scanning, and copying among their

features. A Virtual printer is a piece of computer software whose user interface and API

resemble that of a printer driver, but which is not connected with a physical computer

printer.

Printers are designed for low-volume, short-turnaround print jobs; requiring virtually no

setup time to achieve a hard copy of a given document. However, printers are generally

slow devices (30 pages per minute is considered fast; and many inexpensive consumer

printers are far slower than that), and the cost per page is actually relatively high.

However this is offset by the on-demand convenience and project management costs

being more controllable compared to an out-sourced solution. The printing press naturally

remains the machine of choice for high-volume, professional publishing. However, as

printers have improved in quality and performance, many jobs which used to be done by

professional print shops are now done by users on local printers; see desktop publishing.

http://en.wikipedia.org/wiki/Computing

http://en.wikipedia.org/wiki/Peripheral

http://en.wikipedia.org/wiki/Hard_copy

http://en.wikipedia.org/wiki/Human-readable

http://en.wikipedia.org/wiki/Human-readable

http://en.wikipedia.org/wiki/Writing

http://en.wikipedia.org/wiki/Graphics

http://en.wikipedia.org/wiki/Computer_file

http://en.wikipedia.org/wiki/Paper

http://en.wikipedia.org/wiki/Transparency_(projection)

http://en.wikipedia.org/wiki/Computer_peripheral

http://en.wikipedia.org/wiki/Printer_cable

http://en.wikipedia.org/wiki/USB_cable

http://en.wikipedia.org/wiki/Computer_network

http://en.wikipedia.org/wiki/Ethernet

http://en.wikipedia.org/wiki/Memory_stick

http://en.wikipedia.org/wiki/Memory_card

http://en.wikipedia.org/wiki/Digital_camera

http://en.wikipedia.org/wiki/Image_scanner

http://en.wikipedia.org/wiki/Fax_machine

http://en.wikipedia.org/wiki/Photocopier

http://en.wikipedia.org/wiki/Multifunction_Printer

http://en.wikipedia.org/wiki/Virtual_printer

http://en.wikipedia.org/wiki/Printing_press

http://en.wikipedia.org/wiki/Print_shop

http://en.wikipedia.org/wiki/Desktop_publishing



The world's first computer printer was a 19th century mechanically driven apparatus

invented by Charles Babbage for his Difference Engine.

Printing technology

Printers are routinely classified by the underlying print technology they employ;

numerous such technologies have been developed over the years. The choice of print

engine has a substantial effect on what jobs a printer is suitable for, as different

technologies are capable of different levels of image/text quality, print speed, low cost,

noise; in addition, some technologies are inappropriate for certain types of physical

media (such as carbon paper or transparencies).

Another aspect of printer technology that is often forgotten is resistance to alteration:

liquid ink such as from an inkjet head or fabric ribbon becomes absorbed by the paper

fibers, so documents printed with a liquid ink sublimation printer are more difficult to

alter than documents printed with toner or solid inks, which do not penetrate below the

paper surface.

Checks should either be printed with liquid ink or on special "check paper with toner

anchorage".[2]

For similar reasons carbon film ribbons for IBM Selectric typewriters bore

labels warning against using them to type negotiable instruments such as checks. The

machine-readable lower portion of a check, however, must be printed using MICR toner

or ink. Banks and other clearing houses employ automation equipment that relies on the

magnetic flux from these specially printed characters to function properly.

Types of Printers:

1. Dot Matrix Printer

2. Inkjet Printer

3. Laser Printer

http://en.wikipedia.org/wiki/Computer

http://en.wikipedia.org/wiki/Charles_Babbage

http://en.wikipedia.org/wiki/Difference_Engine

http://en.wikipedia.org/wiki/File:Lexmark_X5100_Series.jpg

http://en.wikipedia.org/wiki/Carbon_paper

http://en.wikipedia.org/wiki/Transparency_(projection)

http://www.sycenturystar.com/ensjx/index.asp

http://en.wikipedia.org/wiki/Printer_(computing)#cite_note-1

http://en.wikipedia.org/wiki/Magnetic_Ink_Character_Recognition



Dot matrix Printer Ink jet Printer Laser Printer

Modern print technology

The following printing technologies are routinely found in modern printers:

Toner-based printers

Laser printer

Toner-based printers work using the Xerographic principle that is used in most

photocopiers: by adhering toner to a light-sensitive print drum, then using static

electricity to transfer the toner to the printing medium to which it is fused with heat and

pressure.

The most common type of toner-based printer is the laser printer, which uses precision

lasers to cause toner adherence. Laser printers are known for high quality prints, good

print speed, and a low (Black and White) cost-per-copy. They are the most common

printer for many general-purpose office applications, but are much less common as

consumer printers due to their high initial cost - although this cost is dropping.

Laser printers are available in both color and monochrome varieties.

Another toner based printer is the LED printer which uses an array of LEDs instead of a

laser to cause toner adhesion to the print drum.

Recent research has also indicated that Laser printers emit potentially dangerous ultrafine

particles, possibly causing health problems associated with respiration [1] and cause

http://en.wikipedia.org/wiki/Laser_printer

http://en.wikipedia.org/wiki/Xerography

http://en.wikipedia.org/wiki/Photocopier

http://en.wikipedia.org/wiki/Toner

http://en.wikipedia.org/wiki/Laser_printer

http://en.wikipedia.org/wiki/Precision

http://en.wikipedia.org/wiki/Laser



http://en.wikipedia.org/wiki/LED_printer

http://en.wikipedia.org/wiki/LED



http://en.wikipedia.org/wiki/Ultrafine_particles

http://en.wikipedia.org/wiki/Ultrafine_particles

http://www.news.qut.edu.au/cgi-bin/WebObjects/News.woa/wa/goNewsPage?newsEventID=13495



pollution equivalent to cigarettes.[3]

The degree of particle emissions varies with age,

model and design of each printer but is generally proportional to the amount of toner

required. Furthermore, a well ventilated workspace would allow such ultrafine particles

to disperse thus reducing the health side effects.

Liquid inkjet printers

Inkjet printer

Inkjet printers operate by propelling variably-sized droplets of liquid or molten material

(ink) onto almost any sized page. They are the most common type of computer printer for

the general consumer[citation needed]

due to their low cost, high quality of output, capability

of printing in vivid color, and ease of use.

Solid ink printers

Solid ink

Solid Ink printers, also known as phase-change printers, are a type of thermal transfer

printer. They use solid sticks of CMYK colored ink (similar in consistency to candle

wax), which are melted and fed into a piezo crystal operated print-head. The printhead

sprays the ink on a rotating, oil coated drum. The paper then passes over the print drum,

at which time the image is transferred, or transfixed, to the page.

Solid ink printers are most commonly used as color office printers, and are excellent at

printing on transparencies and other non-porous media. Solid ink printers can produce

excellent results. Acquisition and operating costs are similar to laser printers. Drawbacks

of the technology include high power consumption and long warm-up times from a cold

state.

Also, some users complain that the resulting prints are difficult to write on (the wax tends

to repel inks from pens), and are difficult to feed through Automatic Document Feeders,

but these traits have been significantly reduced in later models. In addition, this type of

printer is only available from one manufacturer, Xerox, manufactured as part of their

Xerox Phaser office printer line. Previously, solid ink printers were manufactured by

Tektronix, but Tek sold the printing business to Xerox in 2001.

Dye-sublimation printers

Dye-sublimation printer

http://en.wikipedia.org/wiki/Printer_(computing)#cite_note-2

http://en.wikipedia.org/wiki/Inkjet_printer

http://en.wikipedia.org/wiki/Wikipedia:Citation_needed

http://en.wikipedia.org/wiki/Solid_ink


http://en.wikipedia.org/wiki/Thermal_transfer_printer

http://en.wikipedia.org/wiki/Thermal_transfer_printer

http://en.wikipedia.org/wiki/CMYK



http://en.wikipedia.org/wiki/Pen

http://en.wikipedia.org/wiki/Automatic_Document_Feeder

http://en.wikipedia.org/wiki/Xerox

http://en.wikipedia.org/wiki/Xerox_Phaser


http://en.wikipedia.org/wiki/Tektronix

http://en.wikipedia.org/wiki/Dye-sublimation_printer



A dye-sublimation printer (or dye-sub printer) is a printer which employs a printing

process that uses heat to transfer dye to a medium such as a plastic card, paper or canvas.

The process is usually to lay one color at a time using a ribbon that has color panels. Dye-

sub printers are intended primarily for high-quality color applications, including color

photography; and are less well-suited for text. While once the province of high-end print

shops, dye-sublimation printers are now increasingly used as dedicated consumer photo

printers.

Inkless printers

Thermal printers

Thermal printer

Thermal printers work by selectively heating regions of special heat-sensitive paper.

Monochrome thermal printers are used in cash registers, ATMs, gasoline dispensers and

some older inexpensive fax machines. Colors can be achieved with special papers and

different temperatures and heating rates for different colors. One example is the ZINK

technology.

UV printers

Xerox is working on an inkless printer which will use a special reusable paper coated

with a few micrometres of UV light sensitive chemicals. The printer will use a special

UV light bar which will be able to write and erase the paper. As of early 2007 this

technology is still in development and the text on the printed pages can only last between

16-24 hours before fading.

Printing speed

The speed of early printers was measured in units of characters per second. More

modern printers are measured in pages per minute. These measures are used primarily as

a marketing tool, and are not well standardised. Usually pages per minute refers to sparse

monochrome office documents, rather than dense pictures which usually print much more

slowly. PPM are most of the time referring to A4 paper in Europe and letter paper in the

US, resulting in a 5-10% difference.

5.2 Plotters

http://en.wikipedia.org/wiki/Canvas_print

http://en.wikipedia.org/wiki/Thermal_printer

http://en.wikipedia.org/wiki/Automatic_teller_machine

http://en.wikipedia.org/wiki/Gas_pump


http://en.wikipedia.org/wiki/Ultraviolet

http://en.wikipedia.org/wiki/A4_paper



A plotter is a vector graphics printing device to print graphical plots that

connects to a computer. A graphics printer draws images with ink pens. It actually draws

point-to-point lines directly from vector graphics files. The plotter was the first computer

output device that could print graphics as well as accommodate full-size engineering and

architectural drawings. Using different colored pens, it was also able to print in color long

before inkjet printers became an alternative.There are different types of plotters.

Drum Plotters

Electrostatic plotters

Flat Bed Plotters

Inkjet Plotters

Pen Plotters

Drum Plotters

A type of pen plotter that wraps the paper around a drum with a pin feed

attachment. The drum turns to produce one direction of the plot, and the pens move to

provide the other. The plotter was the first output device to print graphics and large

engineering drawings. Using different colored pens, it could draw in color long before

color inkjet printers became viable.

Electrostatic Plotters

This plotter uses an electrostatic method of printing. Liquid toner models use a

positively charged toner that is attracted to paper which is negatively charged by passing

by a line of electrodes (tiny wires or nibs). Models print in black and white or color, and

some handle paper up to six feet wide. Newer electrostatic plotters are really large-format

laser printers and focus light onto a charged drum using lasers or LEDs.

http://en.wikipedia.org/wiki/Vector_graphics

http://en.wikipedia.org/wiki/Computer_printer

http://en.wikipedia.org/wiki/Plot_(graphics)




Flatbed Plotters

This is a graphics plotter that contains a flat surface that the paper is placed on.

The size of this surface (bed) determines the maximum size of the drawing.

Inkjet Plotters

This is a printer that propels droplets of ink directly onto the medium. Today,

almost all inkjet printers produce color. Low-end inkjets use three ink colors (cyan,

magenta and yellow), but produce a composite black that is often muddy. Four-color

inkjets (CMYK) use black ink for pure black printing. Inkjet printers run the gamut from

less than a hundred to a couple hundred dollars for home use to tens of thousands of

dollars for commercial poster printers

Pen Plotters



Pen plotters print by moving a pen across the surface of a piece of paper. This

means that plotters are restricted to line art, rather than raster graphics as with other

printers. Pen plotters can draw complex line art, including text, but do so very slowly

because of the mechanical movement of the pens. Pen Plotters are incapable of creating a

solid region of color; but can hatch an area by drawing a number of close, regular lines.

When computer memory was very expensive, and processor power was very slow, this

was often the fastest way to produce color high-resolution vector-based artwork, or very

large drawings efficiently.

5.3 Displays A display device is an output device for presentation of information for visual or

tactile reception, acquired, stored, or transmitted in various forms. When the input

information is supplied as an electrical signal, the display is called electronic display.A

display device is anything that will put images on a screen to see what input and actions a

user would ultimately need visual confirmation. The most common display is the default

monitor. By its term means that by default any monitor should work if installed on a CPU

prior to turning the power on. The screen has dials that can make the display seem blank

and sometimes adjustments must be made to the display itself.

CRT MONITOR [cathode-ray tube]

Features

3. High Voltage Device

4. Two connections present in a CRT.

i) To the AC power outlet

ii) To the System Unit (DB-15, DVI)

Disadvantages of CRT

http://en.wikipedia.org/wiki/Pen

http://en.wikipedia.org/wiki/Raster_graphics


http://en.wikipedia.org/wiki/Hatching

http://en.wikipedia.org/wiki/Computer_memory

http://en.wikipedia.org/wiki/Output_device

http://en.wikipedia.org/wiki/Information

http://en.wikipedia.org/wiki/Visual

http://en.wikipedia.org/wiki/Tactile



They have a big back and take up space on desk.

The electromagnetic fields emitted by CRT monitors constitute a health hazard to

the functioning of living cells.

CRTs emit a small amount of X-ray band radiation which can result in a health

hazard.

Constant refreshing of CRT monitors can result in headache.

CRTs operate at very high voltage which can overheat system or result in an

implosion

Within a CRT a strong vacuum exists in it and can also result in a implosion

They are heavy to pick up and carry around

Advantages of CRT

The cathode rayed tube can easily increase the monitor‘s brightness by reflecting

the light.

They produce more colours

The Cathode Ray Tube monitors have lower price rate than the LCD display or

Plasma display.

The quality of the image displayed on a Cathode Ray Tube is superior to the

LCD and Plasma monitors.

The contrast features of the cathode ray tube monitor are considered highly

excellent.

How CRTs work & display?

A CRT monitor contains millions of tiny red, green, and blue phosphor dots that

glow when struck by an electron beam that travels across the screen to create a visible

image. In a CRT monitor tube, the cathode is a heated filament. The heated filament is

in a vacuum created inside a glass tube. The electrons are negative and the screen gives a

positive charge so the screen glows.



LCD Monitor

Flat panel display

Features

1) HVD in Desktops and LVD in Laptops

2) Two connections present in a HVD LCD Monitor

i) To the AC power outlet

ii) To the System Unit

3) Highly energy efficient.

Flat panel displays encompass a growing number of technologies enabling

video displays that are lighter and much thinner than traditional television and video

displays that use cathode ray tubes, and are usually less than 4 inches (100 mm) thick.

They can be divided into two general categories: Volatile or Static. Flat panel displays

balance their smaller footprint and trendy modern look with high production costs and in

many cases inferior images compared with traditional CRTs. In many applications,

specifically modern portable devices such as laptops, cellular phones, and digital

cameras, whatever disadvantages exist are overcome by the portability requirements.

Volatile

Volatile displays require pixels be periodically refreshed to retain their state,

even for a static image. This refresh typically occurs many times a second. If this is not

done, the pixels will gradually lose their coherent state, and the image will "fade" from

the screen.

http://en.wikipedia.org/wiki/Image:502px-TFT_Monitor_Flachbildschirm.jpg

http://en.wikipedia.org/wiki/Cathode_ray_tube

http://en.wikipedia.org/wiki/Cathode_ray_tube

http://en.wikipedia.org/wiki/Laptop

http://en.wikipedia.org/wiki/Cellular_phone





Examples of volatile flat panel displays

Plasma displays

Liquid crystal displays (LCDs)

Organic light-emitting diode displays (OLEDs)

Light-emitting diode display (LED)

Electroluminescent displays (ELDs)

Surface-conduction electron-emitter displays (SEDs)

Field emission displays (FEDs)

Nano-emissive display (NEDs)

Static

Static flat panel displays rely on materials whose color states are bistable. This

means that the image they hold requires no energy to maintain, but instead requires

energy to change. This results in a much more energy-efficient display, but with a

tendency towards slow refresh rates which are undesirable in an interactive display.

Examples of static flat panel displays

electrophoretic displays (e.g. E Ink's electrophoretic imaging film)

bichromal ball displays (e.g. Xerox's Gyricon)

Interferometric modulator displays (e.g. Qualcomm's iMod, a MEMS

display.)

Cholesteric displays (e.g. MagInk, Kent Displays)

Bistable nematic liquid crystal displays (e.g. ZBD)

5.4 Keyboard Keyboards are designed for the input of text and characters and also to control the

operation of a computer.

Types of Keyboards

http://en.wikipedia.org/wiki/Plasma_display

http://en.wikipedia.org/wiki/AMLCD

http://en.wikipedia.org/wiki/Organic_light-emitting_diode

http://en.wikipedia.org/wiki/Light-emitting_diode_display

http://en.wikipedia.org/wiki/Electroluminescent_display

http://en.wikipedia.org/wiki/Surface-conduction_electron-emitter_displays

http://en.wikipedia.org/wiki/Field_emission_display

http://en.wikipedia.org/wiki/Nano-emissive_display

http://en.wikipedia.org/wiki/Bistable

http://en.wikipedia.org/wiki/Electrophoretic_display

http://en.wikipedia.org/wiki/E_Ink

http://en.wikipedia.org/wiki/Gyricon


http://en.wikipedia.org/wiki/Interferometric_modulator_display

http://en.wikipedia.org/wiki/Qualcomm

http://en.wikipedia.org/wiki/MEMS

http://en.wikipedia.org/w/index.php?title=Cholesteric&action=edit&redlink=1



1. Based on Layout

-QWERTY layout

-DVORAK layout

2. Ergonomic – Based on comfort

Standard Keyboards

The number of keys on a keyboard varies from the original standard of 101 keys to the

104-key windows keyboards.

Qwerty Layout

Dvorak Keyboard

Ergonomic Keyboard



Differences between Dvorak and Qwerty

Typing a 62 word paragraph, Dvorak used between 35% to 20% less movement, and

saved almost 6 feet of finger movement, out of the 16 feet of finger movement needed to

type these short paragraphs with Qwerty. This is the 'minimum' of difference. In actual

practice, nearly all would show more savings, with a range of up to about 50%



LAYOUTOFA KEYBOARD

Layout of a keyboard can be divided into five sections:-

i) Typical Keys

These keys include letter keys [1,2,A,B etc], which are generally laid out in

same style that was common for typewriters.

ii)

Numeric

Keypad:-

Numeric keys are located on right hand side of keyboard.

Generally it consists of a set of 17 keys that are laid out in same

configuration used by adding machines and calculators.

iii) Function keys:-

The function keys [F1,F2,F3 etc] are arranged in a row along the top of the

keyboard and could be assigned specific commands by current application or

the operating system.

iv) Control keys:-

These keys provide curser and screen control. It includes 4 directional

arrow keys, that are arranged in an inverted T formation between the typing

keys and in numeric keypad. Control keys also include Home, End, Insert,

Delete, Page up, Control [ctrl], Page down, Alternate [alt] & Escape [Esc].

The Windows keyboard also consists of two windows or start keys and an

Application key.

v) Special Purpose Keys:-

Apart from the above discussed keys, a keyboard contains some special

purpose keys such as Enter, Shift, Caps lock, Num lock, Space bar, Tab and

Print Screen.

WORKING

When the user presses the keys, a code corresponding to that key press is send to

the operating system. A copy of this code is also stored in the keyboard‘s memory. When

the operating system reads the scan code, it informs the same to the keyboard and scan

code stored in keyboard‘s memory is then erased. And the action corresponding to the

code is done. If the user hold down a key, the processor determines that he wish to send

that character repeatedly to the computer. In this process, the delay between each

instance of a character can normally be set in operating system, typically ranges from 2 to

30 characters per seconds (cps).



Changing the Keyboard Layout

Start > Control Panel > Regional & Language Options >Language > Details > Add >

Enable Keyboard Layout >United Stated - Dvorak

5.5 Mouse A mouse (plural mice or mouses) functions as a pointing device by detecting two-

dimensional motion relative to its supporting surface. Can be used only with GUI

(Graphical user interface) based OS. E.g. Windows.

Types of mouse based on mechanism

1. Mechanical Mouse

2. Optical Mouse

Mechanical Mouse

A mouse that uses a rubber ball that makes contact with wheels inside the unit when it is

rolled on a pad or desktop.



Optical Mouse

A mouse that uses light to detect movement. It emits a light and senses its reflection as it

is moved. Early optical mice required a special mouse pad, but today's devices can be

rolled over traditional pads like a mechanical mouse as well as over almost any surface

other than glass or mirror.

5.6 Optical Mark Reader The Optical Mark Reader is a device that "reads" pencil marks on NCS

compatible scan forms such as surveys or test answer forms. Optical Mark Reader is a

scanning device that can read marks such as pencil marks on a page; used to read forms

and multiple-choice questionnaires. Think of it as the machine that checks multiple

choice computer forms. In this document The Optical Mark Reader will be referred to as

the scanner or OMR. The computer test forms designed for the OMR are known as NCS

compatible scan forms. Tests and surveys completed on these forms are read in by the

scanner, checked, and the results are saved to a file. This data file can be converted into

an output file of several different formats, depending on which type of output you desire.

The OMR is a powerful tool that has many features. While using casstat

(grading tests), the OMR will print the number of correct answers and the percentage of

correct answers at the bottom of each test. It will also record statistical data about each

question. This data is recorded in the output file created when the forms are scanned.



5.7 Optical character recognition

It is a device used for optical character recognition. Optical character

recognition, usually abbreviated to OCR, is the mechanical or electronic translation of

images of handwritten, typewritten or printed text (usually captured by a scanner) into

machine-editable text.

OCR is a field of research in pattern recognition, artificial intelligence and

machine vision. Though academic research in the field continues, the focus on OCR has

shifted to implementation of proven techniques. Optical character recognition (using

optical techniques such as mirrors and lenses) and digital character recognition (using

scanners and computer algorithms) were originally considered separate fields. Because

very few applications survive that use true optical techniques, the OCR term has now

been broadened to include digital image processing as well.

Early systems required training (the provision of known samples of each

character) to read a specific font. "Intelligent" systems with a high degree of recognition

accuracy for most fonts are now common. Some systems are even capable of reproducing

formatted output that closely approximates the original scanned page including images,

columns and other non-textual components.

5.8 Device interface

An interface device (IDF) is a hardware component or system of components that allows

a human being to interact with a computer, a telephone system, or other electronic

information system. The term is often encountered in the mobile communication industry

where designers are challenged to build the proper combination of portability, capability,

and ease of use into the interface device. The overall set of characteristics provided by an

interface device is often referred to as the user interface (and, for computers - at least, in

more academic discussions - the human-computer interface or HCI ). Today's desktop

and notebook computers have what has come to be called a graphical user interface

(GUI) to distinguish it from earlier, more limited interfaces such as the command line

interface (CLI).

http://en.wikipedia.org/wiki/Mechanical

http://en.wikipedia.org/wiki/Electronics

http://en.wikipedia.org/wiki/Image


http://en.wikipedia.org/wiki/Pattern_recognition

http://en.wikipedia.org/wiki/Artificial_intelligence

http://en.wikipedia.org/wiki/Machine_vision

http://en.wikipedia.org/wiki/Digital_image_processing

http://en.wikipedia.org/wiki/Typeface

http://searchcio-midmarket.techtarget.com/sDefinition/0,,sid183_gci213992,00.html

http://searchwindevelopment.techtarget.com/sDefinition/0,,sid8_gci213989,00.html

http://searchsoa.techtarget.com/sDefinition/0,,sid26_gci856406,00.html



The Graphics Device Interface (GDI) is a Microsoft Windows application programming

interface and core operating system component responsible for representing graphical

objects and transmitting them to output devices such as monitors and printers.

GDI is responsible for tasks such as drawing lines and curves, rendering fonts and

handling palettes. It is not directly responsible for drawing windows, menus, etc.; that

task is reserved for the user subsystem, which resides in user32.dll and is built atop GDI.

GDI is similar to Macintosh's QuickDraw.

Perhaps the most significant capability of GDI over more direct methods of accessing the

hardware is its scaling capabilities, and abstraction of target devices. Using GDI, it is

very easy to draw on multiple devices, such as a screen and a printer, and expect proper

reproduction in each case. This capability is at the center of all What You See Is What

You Get applications for Microsoft Windows.

A human interface device or HID is a type of computer device that interacts directly

with, and most often takes input from, humans and may deliver output to humans. The

term "HID" most commonly refers to the USB-HID specification.

5.9 I/O processor I/O processor (IOP) A specialized computer that permits autonomous handling of data

between I/O devices and a central computer or the central memory of the computer. It can

be a programmable computer in its own right; in earlier forms, as a wired-program

computer, it was called a channel controller. See also direct memory access. Many

storage, networking, and embedded applications require fast I/O throughput for optimal

performance. Intel® I/O processors allow servers, workstations and storage subsystems

to transfer data faster, reduce communication bottlenecks, and improve overall system

performance by offloading I/O processing functions from the host CPU.

5.10 Standard I/O Interfaces One interface circuitry for one computer does not Support the other.

Separate interface has to be designed

Results in number of Interfaces.

A number of standards have been developed for expansion of bus

Eg: SCSI,PCI and USB

Small Computer Systems Interface (SCSI)

Parallel interface

http://en.wikipedia.org/wiki/Microsoft_Windows

http://en.wikipedia.org/wiki/Application_programming_interface

http://en.wikipedia.org/wiki/Application_programming_interface

http://en.wikipedia.org/wiki/Output_devices

http://en.wikipedia.org/wiki/Computer_display


http://en.wikipedia.org/wiki/Typeface

http://en.wikipedia.org/wiki/Palette_(computing)

http://en.wikipedia.org/wiki/Windows_library_files#User32.dll

http://en.wikipedia.org/wiki/QuickDraw

http://en.wikipedia.org/wiki/WYSIWYG

http://en.wikipedia.org/wiki/WYSIWYG

http://en.wikipedia.org/wiki/Peripheral

http://en.wikipedia.org/wiki/Input_device

http://en.wikipedia.org/wiki/Output_device

http://en.wikipedia.org/wiki/USB_human_interface_device_class

http://www.encyclopedia.com/doc/1O11-wiredprogramcomputer.html

http://www.encyclopedia.com/doc/1O11-wiredprogramcomputer.html

http://www.encyclopedia.com/doc/1O11-directmemoryaccess.html



8, 16, 32 bit data lines

Daisy chained

Devices are independent

Devices can communicate with each other as well as host

SCSI - 1

Early 1980s

8 bit

5MHz

Data rate 5MBytes.s-1

Seven devices

o Eight including host interface

SCSI - 2

1991

16 and 32 bit

10MHz

Data rate 20 or 40 Mbytes.s-1

o Check out Ultra/Wide SCSI

SCSI -3

1993

16 bits

40 MBPS over 68 pin connector.

The number of devices is16

PCI BUS

PCI Local Bus (usually shortened to PCI), or Conventional PCI, specifies a

computer bus for attaching peripheral devices to a computer motherboard. These devices

can take either the form of an integrated circuit fitted onto the motherboard itself, called a

planar device in the PCI specification or an expansion card that fits into a socket. The

name PCI is initialism formed from Peripheral Component Interconnect. The PCI bus is

common in modern PCs, where it has displaced ISA and VESA Local Bus as the standard

expansion bus, and it also appears in many other computer types. Despite the availability

http://en.wikipedia.org/wiki/Computer_bus



http://en.wikipedia.org/wiki/Motherboard

http://en.wikipedia.org/wiki/Integrated_circuit

http://en.wikipedia.org/wiki/Expansion_card

http://en.wiktionary.org/wiki/initialism

http://en.wikipedia.org/wiki/Personal_computer

http://en.wikipedia.org/wiki/Industry_standard_architecture

http://en.wikipedia.org/wiki/VESA_Local_Bus



of faster interfaces such as PCI-X and PCI Express, conventional PCI remains a very

common interface.

The PCI specification covers the physical size of the bus (including wire

spacing), electrical characteristics, bus timing, and protocols. The specification can be

purchased from the PCI Special Interest Group (PCI-SIG).

Typical PCI cards used in PCs include: network cards, sound cards, modems,

extra ports such as USB or serial, TV tuner cards and disk controllers. Historically video

cards were typically PCI devices, but growing bandwidth requirements soon outgrew the

capabilities of PCI. PCI video cards remain available for supporting extra monitors and

upgrading PCs that do not have any AGP or PCI express slots.

USB

Universal Serial Bus (USB) is a serial bus standard to interface devices to a

host computer. USB was designed to allow many peripherals to be connected using a

single standardized interface socket and to improve the plug-and-play capabilities by

allowing hot swapping, that is, by allowing devices to be connected and disconnected

without rebooting the computer or turning off the device. Other convenient features

include providing power to low-consumption devices without the need for an external

power supply and allowing many devices to be used without requiring manufacturer

specific, individual device drivers to be installed.

USB is intended to replace many legacy varieties of serial and parallel ports. USB

can connect computer peripherals such as mice, keyboards, PDAs, gamepads and

joysticks, scanners, digital cameras, printers, personal media players, and flash drives.

For many of those devices USB has become the standard connection method. USB was

originally designed for personal computers, but it has become commonplace on other

devices such as PDAs and video game consoles, and as a bridging power cord between a

device and an AC adapter plugged into a wall plug for charging purposes. As of 2008,

there are about 2 billion USB devices in the world.

RS 232 C

RS-232C is a long-established standard ("C" is the current version) that describes the

physical interface and protocol for relatively low-speed serial data communication

between computers and related devices. It was defined by an industry trade group, the

Electronic Industries Association (EIA), originally for teletypewriter devices.

http://en.wikipedia.org/wiki/PCI-X

http://en.wikipedia.org/wiki/PCI_Express

http://en.wikipedia.org/wiki/PCI-SIG

http://en.wikipedia.org/wiki/Network_card

http://en.wikipedia.org/wiki/Sound_card

http://en.wikipedia.org/wiki/Modem

http://en.wikipedia.org/wiki/USB

http://en.wikipedia.org/wiki/Serial_port

http://en.wikipedia.org/wiki/TV_tuner_card

http://en.wikipedia.org/wiki/Disk_controller

http://en.wikipedia.org/wiki/Video_card

http://en.wikipedia.org/wiki/Video_card

http://en.wikipedia.org/wiki/Serial_communications


http://en.wikipedia.org/wiki/Electrical_connector

http://en.wikipedia.org/wiki/Plug-and-play

http://en.wikipedia.org/wiki/Hot_swapping

http://en.wikipedia.org/wiki/Booting

http://en.wikipedia.org/wiki/Device_driver


http://en.wikipedia.org/wiki/Parallel_port


http://en.wikipedia.org/wiki/Computer_mouse

http://en.wikipedia.org/wiki/Computer_keyboard

http://en.wikipedia.org/wiki/Personal_digital_assistant

http://en.wikipedia.org/wiki/Gamepad

http://en.wikipedia.org/wiki/Joystick




http://en.wikipedia.org/wiki/USB_flash_drive

http://en.wikipedia.org/wiki/Personal_computer

http://en.wikipedia.org/wiki/Video_game_console

http://en.wikipedia.org/wiki/Power_cord

http://en.wikipedia.org/wiki/AC_adapter

http://en.wikipedia.org/wiki/AC_power_plugs_and_sockets

http://searchnetworking.techtarget.com/sDefinition/0,,sid7_gci212839,00.html

http://searchcio-midmarket.techtarget.com/sDefinition/0,,sid183_gci212963,00.html

http://whatis.techtarget.com/definition/0,,sid9_gci213663,00.html



RS-232C is the interface that your computer uses to talk to and exchange data with your

modem and other serial devices. Somewhere in your PC, typically on a Universal

Asynchronous Receiver/Transmitter (UART) chip on your motherboard, the data from

your computer is transmitted to an internal or external modem (or other serial device)

from its Data Terminal Equipment (DTE) interface. Since data in your computer flows

along parallel circuits and serial devices can handle only one bit at a time, the UART chip

converts the groups of bits in parallel to a serial stream of bits. As your PC's DTE agent,

it also communicates with the modem or other serial device, which, in accordance with

the RS-232C standard, has a complementary interface called the Data Communications

Equipment (DCE) interface.

In telecommunications, RS-232 (Recommended Standard 232) is a standard for serial

binary data signals connecting between a DTE (Data Terminal Equipment) and a DCE

(Data Circuit-terminating Equipment). It is commonly used in computer serial ports. A

similar ITU-T standard is V.24.

Short for recommended standard-232C, a standard interface approved by the Electronic

Industries Alliance (EIA) for connecting serial devices. In 1987, the EIA released a new version

of the standard and changed the name to EIA-232-D. And in 1991, the EIA teamed up with

Telecommunications Industry association (TIA) and issued a new version of the standard called

EIA/TIA-232-E. Many people, however, still refer to the standard as RS-232C, or just RS-232.

Almost all modems conform to the EIA-232 standard and most personal computers have an EIA-

232 port for connecting a modem or other device. In addition to modems, many display screens,

mice, and serial printers are designed to connect to a EIA-232 port. In EIA-232 parlance, the device

that connects to the interface is called a Data Communications Equipment (DCE) and the device to

http://whatis.techtarget.com/definition/0,,sid9_gci213237,00.html



http://en.wikipedia.org/wiki/Telecommunications

http://en.wikipedia.org/wiki/Serial_communications

http://en.wikipedia.org/wiki/Data_Terminal_Equipment

http://en.wikipedia.org/wiki/Data_circuit-terminating_equipment



http://en.wikipedia.org/wiki/ITU-T

http://www.webopedia.com/TERM/R/standard.html

http://www.webopedia.com/TERM/R/interface.html

http://www.webopedia.com/TERM/R/Electronic_Industries_Alliance.html

http://www.webopedia.com/TERM/R/Electronic_Industries_Alliance.html

http://www.webopedia.com/TERM/R/serial.html

http://www.webopedia.com/TERM/R/device.html

http://www.webopedia.com/TERM/R/modem.html

http://www.webopedia.com/TERM/R/personal_computer.html

http://www.webopedia.com/TERM/R/port.html

http://www.webopedia.com/TERM/R/display_screen.html

http://www.webopedia.com/TERM/R/mouse.html

http://www.webopedia.com/TERM/R/printer.html

http://www.webopedia.com/TERM/R/DCE.html



which it connects (e.g., the computer) is called a Data Terminal Equipment (DTE).

The EIA-232 standard supports two types of connectors -- a 25-pin D-type connector (DB-25) and

a 9-pin D-type connector (DB-9). The type of serial communications used by PCs requires only 9

pins so either type of connector will work equally well.

Although EIA-232 is still the most common standard for serial communication, the EIA has

recently defined successors to EIA-232 called RS-422 and RS-423. The new standards are

backward compatible so that RS-232 devices can connect to an RS-422 port.

Role in modern personal computers

PCI Express x1 card with one RS-232 port

Serial port

In the book PC 97 Hardware Design Guide,[3]

Microsoft deprecated support for the RS-

232 compatible serial port of the original IBM PC design. Today, RS-232 is gradually

being replaced in personal computers by USB for local communications. Compared with

RS-232, USB is faster, uses lower voltages, and has connectors that are simpler to

connect and use. Both standards have software support in popular operating systems.

USB is designed to make it easy for device drivers to communicate with hardware.

However, there is no direct analog to the terminal programs used to let users

communicate directly with serial ports. USB is more complex than the RS-232 standard

because it includes a protocol for transferring data to devices. This requires more

software to support the protocol used. RS-232 only standardizes the voltage of signals

and the functions of the physical interface pins. Serial ports of personal computers are

also often used to directly control various hardware devices, such as relays or lamps,

since the control lines of the interface could be easily manipulated by software. This isn't

feasible with USB, which requires some form of receiver to decode the serial data.

http://www.webopedia.com/TERM/R/DTE.html

http://www.webopedia.com/TERM/R/support.html

http://www.webopedia.com/TERM/R/connector.html

http://www.webopedia.com/TERM/R/pin.html

http://www.webopedia.com/TERM/R/communications.html

http://www.webopedia.com/TERM/R/PC.html

http://www.webopedia.com/TERM/R/RS_422_and_RS_423.html

http://www.webopedia.com/TERM/R/backward_compatible.html

http://en.wikipedia.org/wiki/File:RS232_PCI-E.jpg

http://en.wikipedia.org/wiki/File:RS232_PCI-E.jpg

http://en.wikipedia.org/wiki/PCI_Express


http://en.wikipedia.org/wiki/PC_97

http://en.wikipedia.org/wiki/RS-232#cite_note-pc_97-2

http://en.wikipedia.org/wiki/Microsoft

http://en.wikipedia.org/wiki/Universal_Serial_Bus

http://en.wikipedia.org/wiki/Relay



As an alternative, USB docking ports are available which can provide connectors for a

keyboard, mouse, one or more serial ports, and one or more parallel ports. Corresponding

device drivers are required for each USB-connected device to allow programs to access

these USB-connected devices as if they were the original directly-connected peripherals.

Devices that convert USB to RS-232 may not work with all software on all personal

computers and may cause a reduction in bandwidth along with higher latency.

Personal computers may use the control pins of a serial port to interface to devices such

as uninterruptible power supplies. In this case, serial data is not sent, but the control lines

are used to signal conditions such as loss of power or low battery alarms.

Certain industries, in particular marine survey, provide a continued demand for RS-232

I/O due to sustained use of very expensive but aging equipment. It is far cheaper to

continue to use RS-232 than it is to replace the equipment. Some manufacturers have

responded to this demand: Toshiba re-introduced the DB9 Male on the Tecra laptop.

Companies such as Digi specialise in RS232 I/O cards.

Standard details

In RS-232, user data is sent as a time-series of bits. Both synchronous and asynchronous

transmissions are supported by the standard. In addition to the data circuits, the standard

defines a number of control circuits used to manage the connection between the DTE and

DCE. Each data or control circuit only operates in one direction, that is, signaling from a

DTE to the attached DCE or the reverse. Since transmit data and receive data are separate

circuits, the interface can operate in a full duplex manner, supporting concurrent data

flow in both directions. The standard does not define character framing within the data

stream, or character encoding.

IEEE 488.2 (GPIB)

IEEE-488 is a short-range, digital communications bus specification that has been in use

for over 30 years. Originally created for use with automated test equipment, the standard

is still in wide use for that purpose. IEEE-488 is also commonly known as HP-IB

(Hewlett-Packard Interface Bus) and GPIB (General Purpose Interface Bus).

IEEE-488 allows up to 15 devices to share a single eight-bit parallel electrical bus by

daisy chaining connections. The slowest device participates in control and data transfer

handshakes to determine the speed of the transaction. The maximum data rate is about

one Mbyte/s in the original standard, and about 8 Mbyte/s with later extensions.

http://en.wikipedia.org/wiki/Docking_port

http://en.wikipedia.org/wiki/Parallel_port

http://en.wikipedia.org/wiki/Uninterruptible_power_supply

http://en.wikipedia.org/wiki/Time-series

http://en.wikipedia.org/wiki/Bit

http://en.wikipedia.org/wiki/Full_duplex


http://en.wikipedia.org/wiki/Automated_test_equipment

http://en.wikipedia.org/wiki/Electrical_bus

http://en.wikipedia.org/wiki/Daisy_chain_(information_technology)

http://en.wikipedia.org/wiki/Handshaking

http://en.wikipedia.org/wiki/Bit_per_second



The IEEE-488 connector has 24 pins. The bus employs 16 signal lines — eight bi-

directional used for data transfer, three for handshake, and five for bus management —

plus eight ground return lines.

In 1975 the bus was standardized by the Institute of Electrical and Electronics Engineers

as the IEEE Standard Digital Interface for Programmable Instrumentation, IEEE-

488-1975 (now 488.1). IEEE-488.1 formalized the mechanical, electrical, and basic

protocol parameters of GPIB, but said nothing about the format of commands or data.

The IEEE-488.2 standard, Codes, Formats, Protocols, and Common Commands for

IEEE-488.1 (June 1987), provided for basic syntax and format conventions, as well as

device-independent commands, data structures, error protocols, and the like. IEEE-488.2

built on -488.1 without superseding it; equipment can conform to -488.1 without

following -488.2.

While IEEE-488.1 defined the hardware, and IEEE-488.2 defined the syntax, there was

still no standard for instrument-specific commands.

Applications

At the outset, HP-IB's designers did not specifically plan for IEEE-488 to be a standard

peripheral interface for general-purpose computers. By 1977 the Commodore PET/CBM

range of educational/home/personal computers connected their disk drives, printers,

modems, etc, by IEEE-488 bus. All of Commodore's post-PET/CBM 8-bit machines,

from the VIC-20 to the C128, utilized a proprietary 'serial IEEE-488' for peripherals, with

round DIN connectors instead of the heavy-duty HP-IB plugs or a card-edge connector

plugging into the motherboard (for PET computers).

Hewlett-Packard and Tektronix also used IEEE-488 as a peripheral interface to connect

disk drives, tape drives, printers, plotters etc. to their workstation products and HP's HP

2100[4]

and HP 3000[5]

minicomputers. While the bus speed was increased to 10 MB/s for

such applications, the lack of command protocol standards limited third-party offerings

and interoperability, and later, faster, open standards such as SCSI eventually superseded

IEEE-488 for peripheral access.

Additionally, some of HP's advanced pocket calculators/computers of the 1980s, such as

the HP-41 and HP-71B series, could work with various instrumentation via an optional

HP-IB interface. The interface would connect to the calculator via an optional HP-IL

module.

http://en.wikipedia.org/wiki/Institute_of_Electrical_and_Electronics_Engineers

http://en.wikipedia.org/wiki/Commodore_PET

http://en.wikipedia.org/wiki/Disk_drive

http://en.wikipedia.org/wiki/Modem

http://en.wikipedia.org/wiki/Commodore_International

http://en.wikipedia.org/wiki/Commodore_VIC-20

http://en.wikipedia.org/wiki/Commodore_128

http://en.wikipedia.org/wiki/DIN_connector

http://en.wikipedia.org/wiki/Hewlett-Packard

http://en.wikipedia.org/wiki/Tektronix

http://en.wikipedia.org/wiki/HP_2100


http://en.wikipedia.org/wiki/IEEE-488#cite_note-3



http://en.wikipedia.org/wiki/SCSI

http://en.wikipedia.org/wiki/HP-41

http://en.wikipedia.org/wiki/HP-71B

http://en.wikipedia.org/wiki/HP-IL

Date post:	02-Dec-2014
Category:	Documents
Upload:	bineesh-babu
View:	651 times
Download:	5 times

Computer Organization

Documents