cao notess

B.S Anangpuria Institute of Technology & Management Branch: CSE/IT (4th SEM)

Session-2009

Lecture – 1:

• Digital logics• Boolean Algebra• Logic Gates• Truth table

Submitted by:

Prerna Mittal

Computer Architecture and OrganizationCSE – 210-E

Unit -1Basic Principles

1

In this chapter we will be dealing with the basic digital circuits of our computer. That is what are the hardware components we are using , how these hardware components are related and interacted to each other and how this hardware is accessed or seen by the user.

This gives the birth of the classification of our computer study into: Computer design: This is concerned with the hardware

design of the computer. In this designer decides on the specifications of the computer system.

Computer Organization: This is concerned with the way the hardware components operate and the way they are connected to form the computer system.

Computer Architecture: This is concerned with the structure and behavior of the computer as seen by the user. It includes the information formats, the instruction set and addressing modes for accessing memory.

In our course we will be dealing with computer architecture and organization.

Before starting with the computer architecture and organization lets discuss the components which make the hardware or the organization of the computer which is composed of digital circuits which are handled by digital computer.

Digital Computers Imply that the computer deals with digital information Digital information: is represented by binary digits (0 and 1)

Gates – blocks of Hardware that produce 1 or 0 when input logic requirements are satisfied

Functions of gates can be described by: Truth Table Boolean Function Karnaugh Map

Table for various logic gates -1.1

Gate

GATEBinary digital input signal

Binary digital output signal

2

Boolean algebra

Algebra with Binary (Boolean) Variable and Logic Operations Boolean Algebra is useful in Analysis and Synthesis of Digital Logic

Circuits

- Input and Output signals can be represented by Boolean Variables and

- Function of the Digital Logic Circuits can be represented by Logic Operations , i.e., Boolean Function(s)

- From a Boolean function, a logic diagram can be constructed using AND, OR, and I

Note: We can have many circuits for the same Boolean expression.

3

For example:

Truth Table The most elementary specification of the function of a Digital Logic

Circuit is the Truth Table Table that describes the Output Values for all the combinations of the

Input Values, called MINTERMS n input variables → 2n minterms

Summary: Computer Design: what hardware components we need. Computer Organization: how these hardware components are interacted. Computer Architecture: how these are connected with the user. Logic Gates: Blocks of hardware giving result in 0 or 1. Basic 8 logic

gates out of 3 (AND , OR and I ) are basic Boolean Algebra: The representation of input and output signals in the

form of expressions. Truth table: Table that describes the Output Values for all the

combinations of the Input Values

4

Lecture – 2:

• Combinational logic BlocksMultiplexersAddersEncodersDecoders

Combinational circuits are circuits without memory where the outputs are obtained from the inputs only. An n-input m-output combinational circuit is of the form.

Multiplexer is the combinational circuit which selects one of the many inputs depending on the selection criteria.The no of selection inputs depends on the number of inputs in the manner as 2x = y

By this if y is the no of inputs then x is the no of selection lines.Thus if we have 4 input lines, we use 2 selection lines as 22 =4 and so on.And this will be called as 4:1 multiplexer or 4*1 multiplexer.

This has been explained in the diagram as:

Combinational circuits

n input m output

5

AddersHalf AdderFull Adder

Half Adder: Adds 2 bits and give out carry and sum as result

4-to-1 Multiplexer

I0

I1

I2

I3

S0

S1

Y

0 0 I0

0 1 I1

1 0 I2

1 1 I3

Select OutputS1 S0 Y

6

Full Adder: Adds 2 bits with carry in and gives carry out and sum as result.

x

y

x

y

c = xy s = xy’ + x’y = x y

x c

s

0 0 0 00 1 0 11 0 0 11 1 1 0

x y c s 010 0

01

1

y

Truth Table

Digital Circuit

0

XY

Cin

S

cout

0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1

Cout = xy + xcin+ ycin

= xy + (x y) Cin

s = x’y’ cin+x’yc’in+xy’c’in+xyCin

= x y Cin

= (x y) Cin

xCin

xCin

Cout s

x y cin cout s

0

0

10

0

1

11

0

1

0

1

1

0

10

7

Decoder: Decoder takes n inputs and gives 2n outputs. That is we get 8 outputs for 3 inputs and is called as 3* 8 decoder.

We also have 2* 4 decoder and 4*16 decoder and so on.

We are implementing a decoder with the help of NAND gates.

Using NAND gates, it becomes more economical.

8

Summary: Combinational circuits: where the outputs are obtained from the inputs

only. Various combinational circuits are:

o Multiplexers: No of selection inputs depends on the number of inputs in the manner as 2x = y.

o Half Adder: Adds 2 bits and give result as carry and sum.o Full Adder: Adds 2 bits with carry in and gives result as carry out

and Sum.o Encoder: Takes 2n inputs and gives n outputs.o Decoder: Takes n inputs and gives 2n outputs.

Important Questions derived from this:Q1. What is the difference in multiplexer and decoder?Q2.Draw a 4*1 decoder with the help of AND gates.

9

Lecture – 3:

• Sequential logic BlocksLatchesFlip flopsRegistersCounters

• Sequential logic Blocks : logic blocks whose output logic value depends on the input values and the state of the blocks

– In this we have the concept of memory which was not applicable for combinational circuits.

The various sequential blocks or circuits are:

Latches: • A latch is a kind of bistable multivibrator, an electronic circuit which has two

stable states and thereby can store one bit of information. Today the word is mainly used for simple transparent storage elements, while slightly more advanced non-transparent (or clocked) devices are described as flip-flops. Informally, as this distinction is quite new, the two words are sometimes used interchangeably.

S-R latch:

To overcome the restricted combination, one can add gates to the inputs that would convert (S,R) = (1,1) to one of non-restricted combinations. That can be:Q = 1 (1,0) — referred to as an S-latch

10

Q = 0 (0,1) — referred to as an R-latch Keep state (0,0) — referred to as an E-latch

D-LATCHForbidden input values are forced not to occur by using an inverter between the inputs.

Flip Flops:

D – flip flop:

Q

Q’D(data)

E(enable)

D Q

E Q’

E Q’

D Q

D Q(t+1)0 01 1

11

If you compare the D-flip flop and D – latch the only difference you find in the circuit is that latches do not have clocks and flip – flops have it.

So you can note down the difference between latches and flip – flops as:• Latch is an edge triggered device whereas Flip flop is a level triggered.• The output of a latch changes independent of a clock signal whereas the Output of

a Flip - Flop changes at specific times determined by a clocking signal.• In Latch We do not require clock pulses and flip flops are clocked devices.

Characteristics - State transition occurs at the rising edge or falling edge of the clock pulse

Latches

respond to the input only during these periods

Edge-triggered Flip Flops (positive)

respond to the input only at this time

12

Counters: A counter is a device which stores (and sometimes displays) the number of times a particular event or process has occurred, often in relationship to a clock signal.

4 – bit binary counter:

RING COUNTER:

In Ring Counter the output of 1st flip flop is moved to the input of 2nd flip flop.

J K

Q

J K

Q

J K

Q

J K

Q

Clock

CounterEnable

A0 A1 A2 A3

OutputCarry

13

JOHNSON COUNTER :

In Johnson counter the output of last flip flop is inverted and given to the first flip flop.

Registers: It refers to a group of flip-flops operating as a coherent unit to hold data. This is different from a counter, which is a group of flip-flops operating to generate new data by tabulating it.

14

Shift register: A register that is capable of shifting data one bit at a time is called a shift register. The logical configuration of a serial shift register consists of a chain of flip-flops connected in cascade, with the output of one flip-flop being connected to the input of its neighbor. The operation of the shift register is synchronous; thus each flip-flop is connected to a common clock. Using D flip-flops forms the simplest type of shift-registers.

Bi- directional shift register with parallel load

Summary:

D

QC D

QC D

QC D

QC

A0 A1 A2 A3

4 x 1MUX

4 x 1MUX

4 x 1MUX

4 x 1MUX

Clock S0S1 SeriaIInput

I0 I1 I2 I3SerialInput

15

Sequential circuits: output logic value depends on the input values and the state of the blocks. These circuits have memory.

Various combinational circuits are:o Latches: An electronic circuit which has two stable states and

thereby can store one bit of informationo Flip flops: It also has 2 stable states but with memory.o Counter: A device which stores number of times a particular event

or process has occurred.o Registers: A group of flip-flops operating as a coherent unit to hold

data.

Important Questions derived from this:Q1. What is the difference in latch and flip flop?Q2. Explain Johnson counter?Q3. Draw shift register with parallel load.

16

Lecture – 4:

• Stored Program control concept• Flynn’s classification of computers:

– SISD– SIMD– MISD– MIMD

After the discussion of basic principles of hardware and the combinational and sequential circuits we have in our computer system. Let’s see how these components are interacted to make our computer system which we use. We will be starting with the basic architectures of the computer system. And the most basic one which comes is how the programs are stored in our computer system or how the different programs and data are arranged in our system.

Stored Program control concept

• The simplest way to organize a computer is to have one processor register and an instruction code with 2 parts.

– Opcode (What operation is to be completed)– Address (Address of the operands on which the operation is to be

computed)• A computer that by design includes an instruction set architecture and can store in

memory a set of instructions (a program) that details the computation and the data on which computation is to be done.

Memory 4096*16

• The opcode tells us the operation to be performed.• Address tells us the memory location where to find the operand.• For a memory unit of 4096 bits we need 12 bits to specify the address.

Instruction Format

011Opcode

15Address

12

015Binary Operand

Fig 1: Stored Program Organization

Processor register (Accumulator or AC)

Instructions(Program)

Operands(Data)

17

• When we store an instruction code in memory, 4 bits are specified for 16 operations (as 12 bits are for operand address).

• For an operation control fetches the instruction from memory, it decodes the operation (one out of 16) and finds out the operands and then do the operation.

• Computers with one processor register generally name it accumulator (or AC). The operation is performed with the operand and the content of AC.

• In case no operand is specified, we compute the operation on accumulator .E.g. – Clear AC, complement AC etc.

PARALLEL COMPUTERSThe one we studied was very basic one but sometimes we have very large computations in which one processor with general architecture will not of much help. Thus we take the help of many processors or divide the processor functions into many functional units and also doing the same computation on many data values. So to give solutions to all these we have various types of computers.

Architectural Classification– Flynn's classification

• Based on the multiplicity of Instruction Streams and Data Streams• Instruction Stream

– Sequence of Instructions read from memory• Data Stream

– Operations performed on the data in the processor

Fig 2: Classification accordance to Instruction and Data stream

• There are a variety of ways parallel processing can be classified.• M.J.Flynn considered the organization of a computer system by the number of

instructions and data items manipulated simultaneously.• The normal operation of a computer is to fetch instructions from memory and

execute them in the processor.

Number of Data Streams

Number ofInstructionStreams

Single

Multiple

Single Multiple

SISD SIMD

MISD MIMD

18

• The sequence of instructions read from memory constitutes an instruction stream.

• The operations performed on the data in the processor constitute a data stream.

• Parallel processing can be implemented with either instruction stream, data stream or both.

SISD COMPUTER SYSTEMS

SISD (Single instruction single data stream) is the simplest computer available. It contains no parallelism. It has single instruction and single data stream. The instructions associated with SISD are executed sequentially and the system may or may not have external; parallel processing capabilities.

Fig 3: SISD ArchitectureCharacteristics - Standard von Neumann machine - Instructions and data are stored in memory - One operation at a timeLimitations Von Neumann bottleneck Maximum speed of the system is limited by the Memory Bandwidth (bits/sec or bytes/sec) - Limitation on Memory Bandwidth - Memory is shared by CPU and I/OExamples: Superscalar processors

Super pipelined processorsVLIW

MISD COMPUTER SYSTEMS

MISD (Multiple instruction, single data stream) is of no practical usage as there is least chance where a lot of instructions get executed on a single data.

Control Unit

Processor Unit

Memory

Instruction stream

Data stream

19

Fig 4: MISD Architecture• Characteristics

- There is no computer at present that can be Classified as MISDSIMD COMPUTER SYSTEMS

SIMD (Single instruction Multiple data stream) is the computer where a single instruction gets operated with different sets of data. It gets executed with the help of many processing units controlled by a single control unit. The shared memory must contain various modules so that it can communicate with all the processors at the same time.

• Main memory is used for storage of programs.• Master control unit decodes the instruction and determine the instruction to be

executed.

M1 CU1 P1

M2 CU2 P2

Mn CUn Pn

•••

•••

Memory

Instruction stream

Data stream

Control Unit

Alignment network

P1 P2 Pn• • •

M1 MnM2 • • •

Data bus

Instruction Stream

Data stream

Processor units

Memory modules

20

Memory

Fig 5: SIMD Architecture• Characteristics

- Only one copy of the program exists - A single controller executes one instruction at a time

Examples:Array processorsSystolic arraysAssociative processors

MIMD COMPUTER SYSTEMS

MIMD (Multiple instruction, multiple data stream) refers to a computer system where we have different processing elements working on different data.In this we classify various multiprocessors and multi computers.

• Characteristics - Multiple processing units - Execution of multiple instructions on multiple data

Fig 6: MIMD Architecture

• Types of MIMD computer systems- Shared memory multiprocessors

• UMA• NUMA

- Message-passing multi computers

SHARED MEMORY MULTIPROCESSORS Example systems Bus and cache-based systems - Sequent Balance, Encore Multimax Multistage IN-based systems - Ultra computer, Butterfly, RP3, HEP

Interconnection Network

P1 M1 Pn MnP2 M2 • • •

Shared Memory

21

Crossbar switch-based systems - C.mmp, Alliant FX/8Limitations Memory access latency Hot spot problem

SHARED MEMORY MULTIPROCESSORS (UMA)

Fig 7: Uniform Memory access(UMA)Characteristics All processors have equally direct access to one large memory address space. Thus the access time to reach that memory is same for all processors thus it is named as UMA.

SHARED MEMORY MULTIPROCESSORS (NUMA)


• • •

• • •P1 Pn P2

M1 MnM2


• • •

• • •P1 PnP2

M MM

22

MnM1 M2

Fig 8: NUMA (Non uniform memory access)

Characteristics All processors have equally direct access to one large memory address space and also have their own memory. Thus the access time to reach different memories is different for each processor thus it is named as NUMA.

MESSAGE-PASSING MULTICOMPUTER

Fig 9: Message passing multi computer Architecture

Characteristics - Interconnected computers - Each processor has its own memory, and communicates via message-passing

Example systems - Tree structure: Teradata, DADO - Mesh-connected: Rediflow, Series 2010, J-Machine - Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III

Limitations - Communication overhead - Hard to programming

Summary: Stored Program Control Concept: In this type of organization instructions and

data are stored separately. Flynn’s classification Of computers: It divided the processing work into

data streams and instruction streams and resulted in:

Message-Passing Network

• • •P1 PnP2

M M M• • •

Point-to-point connections

23

o SISD(Single instruction Single data)o SIMD(Single instruction Multiple data)o MISD(Multiple instruction Single data)o MIMD (Multiple instruction Multiple data)

Important Questions:Q1. Explain stored program control concept.Q2. Explain Flynn’s classification of computers.Q3. Describe the concept of data stream and instruction stream.

Lecture -5

MULTILEVEL VIEWPOINT OF A MACHINE MICRO ARCHITECTURE ISA MICRO ARCHITECTURE

CPUCACHESMAIN MEMORY AND SECONDARY MEMORY UNITSINPUT / OUTPUT MAPPING

After the discussion of stored program control concept and the various type of parallel computers, lets study the different components of the computer structure.

MULTILEVEL VIEWPOINT OF A MACHINEOur computer is build on various layers.

These layers are basically divided into:Software layerHardware LayerInstruction Set Architecture

24

Fig 1: Multilevel viewpoint of a machine

Computer system architecture is decided on the basis of the type of applications or usage of the computer.

The computer architect decides the different layers and the function of each layer for a specific computer.These layers or functions of each can vary from one organization to another.

Our layered architecture is basically divided into 3 parts:

Macro-Architecture: as a unit of deployment, we will talk about Client applications and COM Servers. Computer Architecture is the conceptual design and fundamental operational structure of a computer system. It is a blueprint and functional description of requirements (especially speeds and interconnections) and design implementations for the various parts of a computer .

• This is basically our software layer of the computer.• It comprises of :

– User Application layerThe user layer is basically to give the interface to the user with the computer for which the computer is designed .At this layer the user gives the inputs as what processing has to be done .The requirements given by the user has to be implemented by the computer architect with the help of other layers.

– High level language

INSTRUCTION SET ARCHITECTURE (ISA)

PROCESSOR MEMORY I/0 SYSTEM

CIRCUIT LEVEL DESIGN

SILICON LAYOUT LAYER

COMPILER

ASSEMBLER

OS –MSDOS WINDOWSUNIX / LINUX

USER APPLICATION LAYERSOFTWARE LAYER

HARDWARE LAYER

DATA PATH AND CONTROL

GATE LEVEL DESIGN

MACRO ARCHITECTURE

MICRO ARCHITECTURE

25

High-level programming language is a programming language with strong abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or more portable across platforms. Such languages hide the details of CPU operations such as memory access models and management of scope.E.g. – C/Fortran/Pascal .These are not computer dependent.

– Assembly languageAssembly Language refers to the lowest-level human-readable method for programming a particular computer. Assembly Languages are platform specific, and therefore there is a different Assembly Language necessary for programming every different type of computer.

– Machine languageMachine languages consist entirely of numbers and are almost impossible for humans to read and write.

– Operating systemOperating systems interface with hardware to provide the necessary services for application software. E.g. OS, LINUX, UNIX etc.

• Functions of Operating system:– Process management– Memory management– File management– Device management– Error Detection– Security

• Types of Operating system:– Multiprogramming Operating System– Multiprocessing Operating system– Time Sharing Operating system– Real time Operating system– Distributed Operating system– Network Operating system

– CompilerSoftware that translates a program written in a high-level programming language (C/C++, COBOL, etc.) into machine language. A compiler usually generates assembly language first and then translates the assembly language into machine language. A utility known as a "linker" then combines all required machine language modules into an executable program that can run in the computer.

26

– Assembler is the software that translates assembly language into machine language. Contrast with compiler, which is used to translate a high-level language, such as COBOL or C, into assembly language first and then into machine language.

Instruction set architecture: This is an abstraction on the interface between the hardware and the low-level software. It deals with the functional behaviour of a computer system as viewed by a programmer . Computer organization deals with structural relationships that are not visible by a programmer. Instruction set architecture is the attribute of a computing system, as seen by the assembly language programmer or compiler.

ISA is determined by: Data Storage. Memory Addressing Modes. Operations in the Instruction Set. Instruction Formats. Encoding the Instruction Set. Compiler’s View.

Micro-Architecture: inside a unit of deployment we will talk about running process, COM apartment, thread concurrency and synchronization, memory sharing.

Micro architecture, also known as Computer organization is a lower level, more concrete, description of the system that involves how the constituent parts of the system are interconnected and how they interoperate in order to implement the ISA. The size of a computer’s cache for instance, is an organizational issue that generally has nothing to do with the

Processor memory I /o system – These are the basic hardware devices required for the processing of any system application.

Data path and control – In different computers we have different number and type of registers and other logic circuits .The data path and control decides the flow of information within the various parts of the computer system in various circuits.

Gate level design – These circuits such as register, counters etc are implemented in the form of various gates available.

Circuit level design – to add the gates to form a logical circuit or a component we have the basic circuit level design which ultimately gives birth to all the hardware components of a computer system.

Silicon layout layer

Other than the architecture of the computer , we have some very basic units which are important for our computer.

27

Memory units: Main Memory: The main memory of the computer is also known as RAM,

standing for Random Access Memory. It is constructed from integrated circuits and needs to have electrical power in order to maintain its information. When power is lost, the information is lost too! It can be directly accessed by the CPU.

Caches: A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations. Cache memory is random access memory (RAM) that a computer microprocessor can access more quickly than it can access regular RAM. As the microprocessor processes data, it looks first in the cache memory and if it finds the data there (from a previous reading of data), it does not have to do the more time-consuming reading of data from larger memory.

Secondary Memory: Secondary memory which is sometimes called backing store or external memory, allows the permanent storage of large quantities of data. Example : Hard disk , floppy disk , CDs etc.

CPU: A central processing unit (CPU) is a machine that can execute computer programs. The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program. The program is represented by a series of numbers that are kept in some kind of computer memory. There are four steps that nearly all CPUs use in their operation: fetch, decode, execute, and writeback.

I/O units: I/O refers to the communication between an information processing system (such as a computer), and the outside world – possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it.

Summary: Multilevel view point of a machine describes the complete structure of the

computer system in a hierarchical manner which comprises of:o Macro Architecture: Hardware componentso Micro Architecture: Software components

Operating system High level language Assembly language Compiler Assembler

o ISA: How hardware components and software components are connected. It describes

Data Storage. Memory Addressing Modes. Operations in the Instruction Set.

28

http://en.wikipedia.org/wiki/Output

http://en.wikipedia.org/wiki/Input

http://en.wikipedia.org/wiki/Computer

http://en.wikipedia.org/wiki/Information_processing_system

http://en.wikipedia.org/wiki/Memory_(computers)

http://en.wikipedia.org/wiki/Computer_program


http://searchMobileComputing.techtarget.com/sDefinition/0,,sid40_gci212546,00.html

http://searchStorage.techtarget.com/sDefinition/0,,sid5_gci211728,00.html

http://searchCIO-Midmarket.techtarget.com/sDefinition/0,,sid183_gci212568,00.html

http://searchMobileComputing.techtarget.com/sDefinition/0,,sid40_gci214255,00.html

http://en.wikipedia.org/wiki/Main_memory

http://en.wikipedia.org/wiki/Main_memory

http://en.wikipedia.org/wiki/Computer_storage

http://en.wikipedia.org/wiki/Computer

http://en.wikipedia.org/wiki/Central_processing_unit

http://en.wikipedia.org/wiki/Cache

Instruction Formats. Encoding the Instruction Set. Compiler’s View

Other than the structured organization of computer , other important elements are:

o Memoryo CPUo I/O

Important Questions:Q1. Explain multi – level view point of a machine.Q2. Describe micro architecture.Q3. Describe macro architecture.Q4. Explain ISA and why we call it is a link between the hardware and software

components.Q5. What is operating system?

29

Lecture – 6:

• CPU performance measures• MIPS• MFLOPS

After the discussion of all the elements of computer structure in the previous topics , we describe the performance of a computer in this lecture with the help of their performance metrics.

• Performance of a machine is determined by:– Instruction count– Clock cycle time– Clock cycles per instruction

• Processor design (datapath and control) will determine:– Clock cycle time– Clock cycles per instruction

• Single cycle processor - one clock cycle per instruction– Advantages: Simple design, low CPI– Disadvantages: Long cycle time, which is limited by the slowest

instruction

• We have different methods to calculate the performance of a CPU or two compare two CPUs but it highly depends on what type of instructions we give to these CPU.

• The two phenomenon we generally use are:– MIPS– MFLOPS

MIPS: • For a specific program running on a specific computer MIPS is a measure of

how many millions of instructions are executed per second:

MIPS = Instruction count / (Execution Time x 106) = Instruction count / (CPU clocks x Cycle time x 106)

= (Instruction count x Clock rate) / (Instruction count x CPI x 106)

= Clock rate / (CPI x 106)

CPI

Inst. Count

Cycle Time

30

• Faster execution time usually means faster MIPS rating.

MIPS is a good technique but it also have some pitfalls.Problems with MIPS rating:

No account for the instruction set used. Program-dependent: A single machine does not have a single MIPS rating

since the MIPS rating may depend on the program used. Easy to abuse: Program used to get the MIPS rating is often omitted. Cannot be used to compare computers with different instruction sets. A higher MIPS rating in some cases may not mean higher performance or

better execution time i.e. due to compiler design variations.• For a machine with instruction classes:

• For a given program, two compilers produced the following instruction counts:

• The machine is assumed to run at a clock rate of 100 MHz.

MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106) CPI = CPU execution cycles / Instructions count CPU time = Instruction count x CPI / Clock rate

• For compiler 1:– CPI1 = (5 x 1 + 1 x 2 + 1 x 3) / (5 + 1 + 1) = 10 / 7 = 1.43– MIP1 = 100 / (1.428 x 106) = 70.0– CPU time1 = ((5 + 1 + 1) x 106 x 1.43) / (100 x 106) = 0.10 seconds

• For compiler 2:– CPI2 = (10 x 1 + 1 x 2 + 1 x 3) / (10 + 1 + 1) = 15 / 12 = 1.25– MIP2 = 100 / (1.25 x 106) = 80.0– CPU time2 = ((10 + 1 + 1) x 106 x 1.25) / (100 x 106) = 0.15 seconds

Instruction class CPI A 1 B 2 C 3

Instruction counts (in millions) for each instruction class Code from: A B C Compiler 1 5 1 1 Compiler 2 10 1 1

31

MFLOPS:• MFLOPS, for a specific program running on a specific computer, is a measure of

millions of floating point-operation (megaflops) per second.MFLOPS = Number of floating-point operations /(Execution time x 106 )

• MFLOPS is a better comparison measure between different machines than MIPS.

This is better than MIPS but it also has some pitfalls.

Problems with MFLOPS:• A floating-point operation is an addition, subtraction, multiplication, or division

operation applied to numbers represented by a single or a double precision floating-point representation.

• Program-dependent: Different programs have different percentages of floating-point operations present i.e. compilers have no floating- point operations and yield a MFLOPS rating of zero.

• Dependent on the type of floating-point operations present in the program.

Summary: Performance of a machine is determined by:

• Instruction count• Clock cycle time• Clock cycles per instruction

MIPS = Instruction count / (Execution Time x 106)

MFLOPS = Number of floating-point operations /(Execution time x 106 )

Important Questions:Q1. What is MIPS?Q2. What is MFLOPS?Q3. What is the difference between MIPS and MFLOPS?Q4. What are CPU performance measures?

32

Lecture – 7:

Cache Memory Main Memory Secondary Memory

We have basically 3 type of memories attached with our processor.

Cache Memory Main MemorySecondary Memory

Primary storage, presently known as memory, is the only one directly accessible to the CPU. The CPU continuously reads instructions stored there and executes them as required. Any data actively operated on is also stored there in uniform manner.

there are two more sub-layers of the primary storage, besides main large-capacity RAM:

Processor registers are located inside the processor. Each register typically holds a word of data (often 32 or 64 bits). CPU instructions instruct the arithmetic and logic unit to perform various calculations or other operations on this data (or with the help of it). Registers are technically among the fastest of all forms of computer data storage.

Processor cache is an intermediate stage between ultra-fast registers and much slower main memory. It's introduced solely to increase performance of the computer. Most actively used information in the main memory is just duplicated in the cache memory, which is faster, but of much lesser capacity. On the other hand it is much slower, but much larger than processor registers. Multi-level hierarchical cache setup is also commonly used—primary cache being smallest, fastest and located inside the processor; secondary cache being somewhat larger and slower.

These are the type of memories accessed when we work with processor . But if we have to store some data permanently we need to take help of secondary or auxiliary memory.

Secondary memory (or secondary storage) is the slowest and cheapest form of memory. It cannot be processed directly by the CPU. It must first be copied into primary storage (also known as RAM ).

Secondary memory devices include magnetic disks like hard drives and floppy disks ; optical disks such as CDs and CDROMs ; and magnetic tapes, which were the first forms of secondary memory.

Primary memory Secondary memory

33

http://www.webopedia.com/TERM/S/tape.htm

http://www.webopedia.com/TERM/S/CD_ROM.htm

http://www.webopedia.com/TERM/S/compact_disc.htm

http://www.webopedia.com/TERM/S/optical_disk.htm

http://www.webopedia.com/TERM/S/floppy_disk.htm

http://www.webopedia.com/TERM/S/hard_disk_drive.htm

http://www.webopedia.com/TERM/S/RAM.htm

http://www.webopedia.com/TERM/S/CPU.htm

http://www.webopedia.com/TERM/S/memory.htm

http://en.wikipedia.org/wiki/Memory_hierarchy

http://en.wikipedia.org/wiki/CPU_cache

http://en.wikipedia.org/wiki/Arithmetic_and_logic_unit

http://en.wikipedia.org/wiki/Arithmetic_and_logic_unit

http://en.wikipedia.org/wiki/Word_(computing)

http://en.wikipedia.org/wiki/Processor_register

1. Fast 1. Slow2. Expensive 2. Cheap3. Low capacity 3. Large capacity4. Connects directly to the processor 4. Not connected directly to the processor

Hard Disks:

Hard disks similar to cassette tapes use the magnetic recording techniques - the magnetic medium can be easily erased and rewritten, and it will "remember" the magnetic flux patterns stored onto the medium for many years.

Hard drive consists of platter, control circuit board and interface parts.

A hard disk is a sealed unit containing a number of platters in a stack. Hard disks may be mounted in a horizontal or a vertical position. In this description, the hard drive is mounted horizontally.

Electromagnetic read/write heads are positioned above and below each platter. As the platters spin, the drive heads move in toward the center surface and out toward the edge. In this way, the drive heads can reach the entire surface of each platter.

On a hard disk, data is stored in thin, concentric bands. A drive head, while in one position can read or write a circular ring, or band called a track. There can be more than a thousand tracks on a 3.5-inch hard disk. Sections within each track are called sectors. A sector is the smallest physical storage unit on a disk, and is almost always 512 bytes (0.5 kB) in size.

The stack of platters rotate at a constant speed. The drive head, while positioned close to the center of the disk reads from a surface that is passing by more slowly than the surface at the outer edges of the disk. To compensate for this physical difference, tracks near the outside of the disk are less-densely populated with data than the tracks near the center of the disk. The result of the different data density is that the same amount of data can be read over the same period of time, from any

34

drive head position.

The disk space is filled with data according to a standard plan. One side of one platter contains space reserved for hardware track-positioning information and is not available to the operating system. Thus, a disk assembly containing two platters has three sides available for data. Track-positioning data is written to the disk during assembly at the factory. The system disk controller reads this data to place the drive heads in the correct sector position.

Magnetic Tapes:

An electric current in a coil of wire produces a magnetic field similar to that of a bar magnet, and that field is much stronger if the coil has a ferromagnetic (iron-like) core

Tape heads are made from rings of ferromagnetic material with a gap where the tape contacts it so the magnetic field can fringe out to magnetize the emulsion on the tape. A coil of wire around the ring carries the current to produce a magnetic field proportional to the signal to be recorded. If an already magnetized tape is passed beneath the head, it can induce a voltage in the coil. Thus the same head can be used for recording and playback.

35

http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape.html#c4

http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape.html#c3

http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape2.html#c3

http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/toroid.html#c1

http://hyperphysics.phy-astr.gsu.edu/hbase/solids/ferro.html#c4

http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/elemag.html#c3

http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/elemag.html#c5

http://hyperphysics.phy-astr.gsu.edu/hbase/electric/elecur.html#c1

Lecture – 8:

• Instruction Set based classification of computers– Three address instructions– Two address instructions– One address instructions– Zero address instructions– RISC address instructions– CISC address instructions– RISC Vs CISC

In the last chapter we discussed the various architectures and the layers of the computer architecture. In this chapter we are explaining the middle layer of the multilevel view point of a machine i.e. Instruction Set Architecture.

Instruction Set Architecture (ISA) is an abstraction on the interface between the hardware and the low-level software.

It comprises of : Instruction Formats.Memory Addressing Modes.Operations in the Instruction Set.Encoding the Instruction Set.Data Storage.Compiler’s View.

Instruction FormatIs the representation of the instruction. It contains the various Instruction Fields :

opcode field – specify the operations to be performed Address field(s) – designate memory address(es) or processor register(s) Mode field(s) – determine how the address field is to be interpreted to get

effective address or the operand• The number of address fields in the instruction format :

depend on the internal organization of CPU• The three most common CPU organizations :

- Single accumulator organization :ADD X /* AC ← AC + M[X] */

- General register organization :ADD R1, R2, R3 /* R1 ← R2 + R3 */ADD R1, R2 /* R1 ← R1 + R2 */MOV R1, R2 /* R1 ← R2 */ADD R1, X /* R1 ← R1 + M[X] */

- Stack organization :PUSH X /* TOS ← M[X] */

36

ADD

Address Instructions:

Three-address Instructions- Program to evaluate X = (A + B) * (C + D) :

ADD R1, A, B /* R1 ← M[A] + M[B] */ADD R2, C, D /* R2 ← M[C] + M[D] */MUL X, R1, R2 /* M[X] ← R1 * R2 */

- Results in short program- Instruction becomes long (many bits)

• Two-address Instructions- Program to evaluate X = (A + B) * (C + D) :

MOV R1, A /* R1 ← M[A] */ADD R1, B /* R1 ← R1 + M[A] */MOV R2, C /* R2 ← M[C] */ADD R2, D /* R2 ← R2 + M[D] */MUL R1, R2 /* R1 ← R1 * R2 */MOV X, R1 /* M[X] ← R1 */

One-address Instructions- Use an implied AC register for all data manipulation- Program to evaluate X = (A + B) * (C + D) :

LOAD A /* AC ← M[A] */ADD B /* AC ← AC + M[B] */STORE T /* M[T] ← AC */LOAD C /* AC ← M[C] */ADD D /* AC ← AC + M[D] */MUL T /* AC ← AC * M[T] */STORE X /* M[X] ← AC */

• Zero-address Instructions- Can be found in a stack-organized computer- Program to evaluate X = (A + B) * (C + D) :

PUSH A /* TOS ←A */PUSH B /* TOS ←B */ADD /* TOS ← (A + B) */PUSH C /* TOS ←C */PUSH D /* TOS ←D */ADD /* TOS ← (C + D) */MUL /* TOS ← (C + D) * (A + B) */POP X /* M[X] ← TOS */

CISC(Complex Instruction Set Computer)

• These computers with many instructions and addressing modes came to be known as Complex Instruction Set Computers (CISC)

37

• One goal for CISC machines was to have a machine language instruction to match each high-level language statement type.

Criticisms on CISC-Complex Instruction→ Format, Length, Addressing Modes→ Complicated instruction cycle control due to the complex decoding HW

and decoding process- Multiple memory cycle instructions→ Operations on memory data→ Multiple memory accesses/instruction- Microprogrammed control is necessity→ Microprogram control storage takes substantial portion of CPU chip area→ Semantic Gap is large between machine instruction and microinstruction- General purpose instruction set includes all the features required byindividually different applications

→ When any one application is running, all the features required bythe other applications are extra burden to the application

RISC

In the late ‘70s - early ‘80s, there was a reaction to the shortcomings of the CISC style of processors

– Reduced Instruction Set Computers (RISC) were proposed as analternative

• The underlying idea behind RISC processors is to simplify the instruction set and reduce instruction execution time

Note : In RISC type of instructions , we cant access the memory operands directly .

Evaluate X = (A + B) * (C + D) :MOV R1, A /* R1 ← M[A] */MOV R2, B /* R2 ← M[B] */ADD R1,R1,R2 /* R1 ← R1 + R2MOV R2, C /* R2 ← M[C] */MOV R3, D /* R3 ← M[D] */ADD R2,R2, R3 /* R2 ← R2 + R2 */MUL R1,R1, R2 /* R1 ← R1 * R2 */MOV X, R1 /* M[X] ← R1 */

• RISC processors often feature:– Few instructions– Few addressing modes– Only load and store instructions access memory

38

– All other operations are done using on-processor registers– Fixed length instructions– Single cycle execution of instructions– The control unit is hardwired, not microprogrammed

Since all (but the load and store instructions) use only registers for operands,– only a few addressing modes are needed

• By having all instructions the same length :– reading them in is easy and fast

• The fetch and decode stages are simple, looking much more like Mano’s BCthan a CISC machine– The instruction and address formats are designed to be easy to decode– (Unlike the variable length CISC instructions,) the opcode and registerfields of RISC instructions can be decoded simultaneously

• The control logic of a RISC processor is designed to be simple and fast :– The control logic is simple because of the small number of instructions andthe simple addressing modes– The control logic is hardwired, rather than microprogrammed, becausehardwired control is faster

ADVANTAGES OF RISCVLSI Realization- Control area is considerably reduced

RISC chips allow a large number of registers on the chip⇒- Enhancement of performance and HLL support

- Higher regularization factor and lower VLSI design cost

• Computing Speed- Simpler, smaller control unit faster⇒- Simpler instruction set; addressing modes; instruction format

faster decoding⇒- Register operation faster than memory operation⇒- Register window enhances the overall speed of execution⇒- Identical instruction length, One cycle instruction execution

suitable for pipelining faster⇒ ⇒

Design Costs and Reliability- Shorter time to design

reduction in the overall design cost and reduces the problem that the end ⇒product will be obsolete by the time the design is completed

- Simpler, smaller control unit higher reliability⇒

- Simple instruction format (of fixed length)

39

ease of virtual memory management⇒• High Level Language Support

- A single choice of instruction shorter, simpler compiler⇒- A large number of CPU registers more efficient code⇒- Register window Direct support of HLL⇒- Reduced burden on compiler writer

RISC VS CISC

• The CISC Approach Thus, the entire task of multiplying two numbers can be completed with one instruction:

– MULT 2:3, 5:2

• One of the primary advantages of this system is that the compiler has to do very little work to translate a high-level language statement into assembly. Because the length of the code is relatively short, very little RAM is required to store instructions. The emphasis is put on building complex instructions directly into the hardware.

• The RISC Approach In order to perform the exact series of steps described in the CISC approach, a programmer would need to code four lines of assembly:

• LOAD A, 2:3LOAD B, 5:2PROD A, BSTORE 2:3,

• A At first, this may seem like a much less efficient way of completing the operation. Because there are more lines of code, more RAM is needed to store the assembly level instructions. The compiler must also perform more work to convert a high-level language.

•RISC vs CISC

Emphasis on hardware Transistors used for storingcomplex instructions

Emphasis on software

Spends more transistorson memory registers

Includes multi-clockcomplex instructions,

Single-clock reduced instruction only

Memory-to-memory:"LOAD" and "STORE"incorporated in instructions

Register to register:"LOAD" and "STORE" are independent instructions

Small code sizes large code sizes

40

High cycles per second Low cycles per second

Summary: The instruction format is composed of the opcode field, address field, and mode field. The different types of address instructions used are three-address, two-address, one-

address and zero-address. RISC and CISC Introduction with its advantages and criticism RISC Vs CISC

Important Questions:Q1.Explain the different addressing formats in detail with example.Q2.Explain RISC AND CISC with their advantages and criticisms.Q3 Numerical

Lecture – 9:

41

• Addressing modes– Implied Mode – Immediate Mode– Register Mode– Register Indirect Mode– Autoincrement or Autodecrement Mode– Direct Addressing Mode– Indirect Addressing Mode– Relative addressing Mode

In the last lecture we studied the instruction formats, now we study how the instructions use the addressing modes of different types.

Addressing Modes

Addressing Modes* Specifies a rule for interpreting or modifying the address field of the instruction

(before the operand is actually referenced)* Variety of addressing modes

- to give programming flexibility to the user- to use the bits in the address field of the instruction efficiently

In simple words we can say the addressing modes is the way to fetch operands (or Data) from memory.

TYPES OF ADDRESSING MODES• Implied Mode

: Address of the operands are specified implicitly in the definition of the instruction

- No need to specify address in the instruction- EA = AC, or EA = Stack[SP]- Examples from BC : CLA, CME, INP

• Immediate Mode: Instead of specifying the address of the operand,operand itself is specified- No need to specify address in the instruction- However, operand itself needs to be specified- (-)Sometimes, require more bits than the address- (+) Fast to acquire an operand- Useful for initializing registers to a constant value

• Register Mode: Address specified in the instruction is the register address- Designated operand need to be in a register

42

- (+) Shorter address than the memory address-- Saving address field in the instruction- (+) Faster to acquire an operand than the memory addressing- EA = IR(R) (IR(R): Register field of IR)

• Register Indirect Mode: Instruction specifies a register which contains the memory address of the

operand- (+) Saving instruction bits since register address is shorter than the memory

address- (-) Slower to acquire an operand than both the register addressing or memory

addressing- EA = [IR(R)] ([x]: Content of x)

• Autoincrement or Autodecrement Mode- Similar to the register indirect mode except : When the address in the register is used to access memory, the value in the

register is incremented or decremented by 1 automatically• Direct Address Mode

: Instruction specifies the memory address which can be used directly to access the memory

- (+) Faster than the other memory addressing modes- (-) Too many bits are needed to specify the address for a large physical memory

space- EA = IR(addr) (IR(addr): address field of IR)- E.g., the address field in a branch-type instr

• Indirect Addressing Mode: The address field of an instruction specifies the address of a memory location

that contains the address of the operand- (-) Slow to acquire an operand because of an additional memory access- EA = M[IR(address)]

• Relative Addressing Modes: The Address fields of an instruction specifies the part of the address(abbreviated address) which can be used along with a designatedregister to calculate the address of the operand--> Effective addr = addr part of the instr + content of a special register- (+) Large physical memory can be accessed with a small number ofaddress bits- EA = f(IR(address), R), R is sometimes implied--> Typically EA = IR(address) + R- 3 different Relative Addressing Modes depending on R

* (PC) Relative Addressing Mode (R = PC)* Indexed Addressing Mode (R = IX, where IX: Index Register)* Base Register Addressing Mode (R = BAR(Base Addr Register))* Indexed addressing mode vs. Base register addressing mode

- IR(address) (addr field of instr) : base address vs. displacement- R (index/base register) : displacement vs. base address- Difference: the way they are used (NOT the way they are computed)

43

* indexed addressing mode : processing many operands in an array using the same instr* base register addressing mode : facilitate the relocation of programs in memory in multiprogramming systems

Addressing Modes: Examples

Summary: Addressing Modes: Specifies a rule for interpreting or modifying the address field

of the instruction. The different types of addressing modes are: Implied mode, Immediate mode,

Register mode, Register indirect mode, Autoincrement or auto decrement mode, Direct mode, Indirect mode, Relative addressing mode.

Important Questions: Q1. Explain the addressing modes with suitable examples.

44

Lecture – 10:

• Instruction set– Data Transfer Instructions

o Typical Data Transfer Instructionso Data Transfer Instructions with Different Addressing

Modes– Data Manipulation Instructions

o Arithmetic instructionso Logical and bit manipulation instructionso Shift instructions

– Program Control Instructionso Conditional Branch Instructionso Subroutine Call & Return

DATA TRANSFER INSTRUCTIONS

These are the type of instructions used only for the transfer of data from registers to registers, registers to memory operands and other memory components. There is no manipulation done on the data values.

These are the type of instructions in which there is no usage of various addressing modes. We have a direct transfer between the various registers and memory components.

Load LDStore STMove MOVExchange XCHInput INOutput OUTPush PUSHPop POP

Name Mnemonic

Typical Data Transfer Instructions

Table 3.1

45

Like Load and store we used for the transfer of data to and from the accumulator.

LD 20 ST D

Move and Exchange are used for the data transfer between various general purpose registers.

MOV R1,R2 MOV R1,X XCH R1,R2

Input and Output are used for the data transfer between memory and I/O devices.

Push and Pop operations are used for information flow between stack and memory.

Data Transfer Instructions with Different Addressing Modes

In these type of data transfers we use different addressing for loading the operand value in the accumulator register.

DATA MANIPULATION INSTRUCTIONS

Three Basic Types: Arithmetic instructions Logical and bit manipulation instructions Shift instructions

Direct address LD ADR AC M[ADR]Indirect address LD @ADR AC M[M[ADR]]Relative address LD $ADR AC M[PC + ADR]Immediate operand LD #NBR AC NBRIndex addressing LD ADR(X) AC M[ADR + XR]Register LD R1 AC R1Register indirect LD (R1) AC M[R1]Autoincrement LD (R1)+ AC M[R1], R1 R1 + 1Autodecrement LD -(R1) R1 R1 - 1, AC M[R1]

ModeAssemblyConvention

Register Transfer

Table 3.2

46

Arithmetic Instructions : These are the type of instructions used for arithmetical calculations like addition , subtraction , increment etc.

Logical and Bit Manipulation Instructions

These are the type of instructions in which are operations are computed on string of bits. These bits are treated as individual and thus the operation can be done on individual or a group of bits ignoring the whole value and even new bits insertion is possible.

For example: CLR R1 will make all the bits as 0.COM R1 will invert all the bits.AND , OR and XOR will produce the result on 2 individual bits of each operand. E.g.: AND of 0011 and 1100 will result to:

0000.AND instruction is also known as mask instruction as if we have to mask some values of operand we can AND that value with 0s giving other inputs as 1(high).E.g.: Suppose we have to mask register with value 11000110On 1st , 3rd and 7th bit. Then we will have to AND it with value 01011101.

CLRC, SETC and COMC will work only on 1 bit of the operand i.e. Carry.

Similarly in case of EI and DI we work only on 1 bit interrupt flip flop to enable it.

Name MnemonicIncrement INCDecrement DECAdd ADDSubtract SUBMultiply MULDivide DIVAdd with Carry ADDCSubtract with Borrow SUBBNegate(2’s Complement) NEG

Table 3.3

47

Name Mnemonic

Clear CLRComplement COMAND ANDOR ORExclusive-OR XORClear carry CLRCSet carry SETCComplement carry COMCEnable interrupt EIDisable interrupt DI

Shift Instructions : These are the type of instructions which modify the whole value of operand but by shifting the bits on left or right side.

Say R1 has value 11001100o SHR inserts 0 at the left most position.

Result 01100110o SHL inserts 0 at the right most position.

Result 10011000o SHRA : In case of SHRA the sign bit remains same else every bit shift left

or right accordingly. Result 11100110

o SHLA is same as that of SHL inserting 0 in the end. Result 10011000

o In ROR , all the bits are shifted towards right and the rightmost one moves to leftmost position.

Result : 01100110o In ROL , all the bits are shifted towards left and the leftmost one moves to

rightmost position. Result : 10011001

o In case of RORC , suppose we have a carry bit as 0 with register R1. In this all the bits of the register will be right shifted and the value of carry will be moved to leftmost position and the rightmost position will be moved to carry.

48

Table 3.4

Result : 01100110 with carry as 0o Similarly in case of ROLC , we will get all the bits of the register left

shifted and the value of carry moved to rightmost position and the leftmost position will be moved to carry.

Result : 10011000 with carry as 1.

PROGRAM CONTROL INSTRUCTIONS:

Before starting with program control instructions, lets study the concept of PC i.e. Program counter. Program counter is the register which tells us the address of the next instruction to be executed. When we fetch the instruction pointed by PC from memory it changes it value giving us the address of the next instruction to be fetched. In case of sequential instructions it simply increments itself and in case of branching or modular programs it gives us the address of the first instruction of the called program. After the execution of the called program , the program counter points back to the instruction next to the instruction from which the subprogram was called. In case of go to kind of instructions the program counter simply changes the value of program counter with out keeping any reference of the previous instruction..

Logical shift right SHRLogical shift left SHLArithmetic shift right SHRAArithmetic shift left SHLARotate right RORRotate left ROLRotate right thru carry RORCRotate left thru carry ROLC

Name Mnemonic

49

Table 3.5

Program Control Instructions: These instructions are used for the transfer of control to other instructions. That is these instructions are used in case we have to execute the next instruction from some other location instead of sequential manner.

The conditions can be :Calling a sub programReturning to the main programJumping onto some other instruction or locationSkip the instructions in case of break and exit or in case the condition you check is false and so on

*CMP and TST instructions do not retain their results of operation (- and AND, respectively).They only set or clear certain flags.

Conditional Branch Instructions: These are the instructions in which we test some conditions and depending on the result we go either for branching or sequential way.

PC

+1In-Line Sequencing (Next instruction is fetched from the next adjacent location in the memory)

Address from other source; Current Instruction, Stack, etc; Branch, Conditional Branch, Subroutine, etc.

Name Mnemonic

Branch BRJump JMPSkip SKPCall CALLReturn RTNCompare(by ) CMPTest(by AND) TST

50

Table 3.6

Subroutine Call and Return:

Subroutine Call: Call Subroutine Jump to Subroutine Branch to Subroutine Branch & save return address

Two most important operations are implied:*Branch to the beginning of the Subroutine

-Same as the branch or conditional branch

*Save the return address to get the address of the location of the calling program upon exit from the subroutine.

Location of storing return address: Fixed Location in the subroutine (Memory) Fixed Location in memory

BZ Branch if zero Z = 1BNZ Branch if not zero Z = 0BC Branch if carry C = 1BNC Branch if no carry C = 0BP Branch if plus S = 0BM Branch if minus S = 1BV Branch if overflow V = 1BNV Branch if no overflow V = 0

BHI Branch if higher A > BBHE Branch if higher or equal A BBLO Branch if lower A < BBLOE Branch if lower or equal A BBE Branch if equal A = BBNE Branch if not equal A B

BGT Branch if greater than A > BBGE Branch if greater or equal A BBLT Branch if less than A < BBLE Branch if less or equal A BBE Branch if equal A = BBNE Branch if not equal A B

Unsigned compare conditions (A - B)

Signed compare conditions (A - B)

Mnemonic Branch condition Tested condition

51

Table 3.7

In a processor Register In memory stack

-most efficient way

Summary: Data Transfer Instructions are of two types namely: Typical Data Transfer

Instructions and Data Transfer Instructions with Different Addressing Modes. The Data Manipulation Instructions are of three types, which are Arithmetic

instructions, Logical and bit manipulation instructions and Shift instructions. Program Control Instructions can be divided into Conditional Branch Instructions

and Subroutine Call & Return instructions.Important Questions:Q1.Explain the data Transfer instructions.Q2.Explain the data Manipulation instructions.Q3.Explain the Program control instructions with example.

52

CALLSP SP - 1 M[SP] PC

PC EA

RTN PC M[SP] SP SP + 1

Lecture – 11:

Program Interrupts MASM

PROGRAM INTERRUPT:

Types of Interrupts:1. External Interrupt: External interrupts are initiated from outside of CPU &

memory.-I/O device-> Data transfer request or data transfer complete-Timing device ->timeout- Power failure- Operator

2. Internal Interrupts (traps): Internal Interrupts are caused by the currently running program.

- Register, Stack Overflow- Divide by Zero- OP- code violation- Protection Violation

3. Software Interrupts: Both external & internal interrupts are intiated by the computer hardware. Software interrupts are initiated by the executing instruction.

-Supervisor Call -> Switching from user mode to the supervisor mode

-> Allows to execute a certain class of operations which are not allowed in the user mode.

MASM:

If you have used a modern word processor such as Microsoft Word and have noticed the macros feature. Where you can record a series of frequently used actions or commands into the macros. For example, you always need to insert a 2 by 4 column with the title "Date" and "Time". You can start the macro recorder and create the table as you wish. After that, you can save the macro. The next time you need to create the same kind of table, you just need to execute the macro. The same applies for a macro assembler. It enables you to record down frequently performed actions or a frequently used block of code so that you do not have to re-type it each time.

The Microsoft Macro Assembler (abbreviated MASM) is an x86 high-level assembler for DOS and Microsoft Windows. Currently it is the most popular x86 assembler. It supports a wide variety of macro facilities and structured programming idioms, including high-level functions for looping and procedures. Later versions added the capability of

53

http://en.wikipedia.org/wiki/Subroutine

http://en.wikipedia.org/wiki/Control_flow#Loops

http://en.wikipedia.org/wiki/High-level_programming_language

http://en.wikipedia.org/wiki/High-level_programming_language

http://en.wikipedia.org/wiki/Structured_programming

http://en.wikipedia.org/wiki/Macro_(computer_science)

http://en.wikipedia.org/wiki/Assembly_language#Assembler

http://en.wikipedia.org/wiki/Microsoft_Windows

http://en.wikipedia.org/wiki/DOS

http://en.wikipedia.org/wiki/High-level_assembler

http://en.wikipedia.org/wiki/X86_architecture

producing programs for Windows. MASM is one of the few Microsoft development tools that target 16-bit, 32-bit and 64-bit platforms. Earlier versions were MS-DOS applications. Versions 5.1 and 6.0 were OS/2 applications and later versions were Win32 console applications. Versions 6.1 and 6.11 included Phar Lap's TNT DOS extender so that MASM could run in MS-DOS.[citation needed

The name MASM originally referred to as MACRO ASSEMBLER but over the years it has become synonymous with Microsoft Assembler.

An Assembly language translator converts macros into several machine language instructions.

MASM isn't the fastest assembler around (it's not particularly slow, except in a couple of degenerate cases, but there are faster assemblers available).Though very powerful, there are a couple of assemblers that, arguably, are more powerful (e.g., TASM and HLA).

MASM is only usable for creating DOS and Windows applications; you cannot effectively use it to create software for other operating systems.

Benefits of MASMThere are some benefits to using MASM today:–Steve Hutchessen's ("Hutch") MASM32 package provides the support for MASM that Microsoft no longer provides. –You can download MASM (and MASM32) free from Microsoft and other sites. –Most Windows' assembly language examples on the Internet today use MASM syntax. –You may download MASM directly from Webster as part of the MASM32 package.

Summary:

Program Interrupts can be external, internal or software interrupts. MASM is Microsoft or macro assembler used for implementing macros.

Important Questions:Q1.What are Program interrupts. Explain the types of Program interrupts.Q2. Explain MASM in detail.

54

http://en.wikipedia.org/wiki/Wikipedia:Citation_needed

http://en.wikipedia.org/wiki/Wikipedia:Citation_needed

http://en.wikipedia.org/wiki/DOS_extender

http://en.wikipedia.org/wiki/Phar_Lap_(company)

http://en.wikipedia.org/wiki/Win32_console

http://en.wikipedia.org/wiki/Win32_console

http://en.wikipedia.org/wiki/OS/2

http://en.wikipedia.org/wiki/MS-DOS

http://en.wikipedia.org/wiki/Computing_platform

http://en.wikipedia.org/wiki/64-bit



Lecture – 10:

CPU Architecture typeso Accumulator o Register o Stack o Memory / Register

Detailed data path of a register based CPU

In Unit -3 we discussed the instruction set computer(ISA) which deals with the various types of address instructions , addressing modes and different types of instructions in various computer architectures.

In this chapter we will discuss the various type of computer organizations we have.• In general, most processors or computers are organized in one of 3 ways

– Single register (Accumulator) organization• Basic Computer is a good example• Accumulator is the only general purpose register

– Stack organization• All operations are done using the hardware stack• For example, an OR instruction will pop the two top elements from

the stack, do a logical OR on them, and push the result on the stack– General register organization

• Used by most modern computer processors• Any of the registers can be used as the source or destination for

computer operations

Accumulator type of Organization:In case of accumulator type of organizations, one operand is in memory and other is in accumulator.

The instructions we can run with accumulator are :

AC AC Ù DR AND with DRAC AC + DR Add with DRAC DR Transfer from DRAC(0-7) INPR Transfer from INPRAC AC¢ ComplementAC shr AC, AC(15) E Shift rightAC shl AC, AC(0) E Shift leftAC 0 ClearAC AC + 1 Increment

55

Circuit required:

Stack Organization: Stack - Very useful feature for nested subroutines, nested interrupt services - Also efficient for arithmetic expression evaluation - Storage which can be accessed in LIFO - Pointer: SP - Only PUSH and POP operations are applicableStack type of organization is of two types

1616

8

Adder and

logic

circuit

16 AC

AccumulatorFrom DR

From INPR

Control

Gates

LD INR CLR

16To bus

Clock

56

REGISTER STACK ORGANIZATION

Register Stack

Push, Pop operations

ABC

01234

63

Address

FULL EMPTY

SP

DR

Flags

Stack pointer

6 bits

/* Initially, SP = 0, EMPTY = 1, FULL = 0 */

PUSH POPSP SP + 1 DR M[SP]M[SP] DR SP SP 1If (SP = 0) then (FULL 1) If (SP = 0) then (EMPTY 1)EMPTY 0 FULL 0

57

MEMORY STACK ORGANIZATION

Memory with Program, Data, and Stack Segments

A portion of memory is used as a stack with a processor register as a stack pointer

- PUSH: SP SP - 1 M [SP] DR - POP: DR M [SP] SP SP + 1

Note: Most computers do not provide hardware to check stack overflow (full stack) or underflow (empty stack) à must be done in software

Register type of organization:In this we take the help of various registers , say R1 to R8 for transfer and

manipulation of data.

Detailed data path of a typical register based CPU

40014000399939983997

3000

Data(Operands)

Program(Instructions)

1000

PC

AR

SPStack

Stack growsIn this direction

58

To avoid memory access directly (as it is very time consuming and thus a costly technique) , we prefer the register organization as it proves to be more efficient and time saving organization.

In this we are using 7 registers. The two multiplexers and a decoder decide which registers to be used as operands source and what register to be used as a destination for the storage of result.MUX 1 decides the 1st operand register which depends on the values of SELS1 (Selector for source 1).Similarly, for MUX 2, SELs2 works as input for 2nd operand decision.

These two inputs through S1bus and S2 bus reach ALU. OPR denotes the type of operation to be performed and the computation or operation is performed on ALU. Then the result is either stored back in one of the 7 registers with the help of decoder which decides which is the resultant register with the help of SELD.

MUX 1

SELS1 { MUX 2

}SELS2

ALUOPR

R1

R2

R3

R4

R5

R6

R7

Input

3 x 8

Decoder

SELD

Load

(7 lines)

Output/Result

S1 bus

S2 bus

Clock

59

Lecture – 13:

Address Sequencing / Microinstruction Sequencing Implementation of control unit

Address Sequencing/Microinstruction Sequencing:Microinstructions are stored in control memory in groups, with each group specifying a routine. The hardware that controls the address sequencing of the control memory must be capable of sequencing the microinstructions within a routine and be able to branch from one routine to another with the help of this circuit.

Steps :An initial address is loaded into CAR at power turned ON that usually is the first

microinstruction that activates the instruction fetch routine.This routine may be sequenced by incrementing.At the end of the fetch routine the instructionm is in the IR of the computer.Next the control memory computes the effective address of the operand.The net step is the execution of the instruction fetched from memory.The transformation from the instruction code bits to an address in control memory where the routine is located is reffered to as a mapping process.

Instruction code

Mappinglogic

Multiplexers

Control memory (ROM)

Subroutineregister(SBR)

Branchlogic

Statusbits

Microoperations

Control address register(CAR)

Incrementer

MUX

select

select a statusbit

Branch address

60

At the completion of the execution of the instruction, control must return to the fetch routine by executing an unconditional; branch microinstruction to the first address of the fetch routine.Sequencing Capabilities Required in a Control Storage

- Incrementing of the control address register- Unconditional and conditional branches- A mapping process from the bits of the machine instruction to an address for control memory- A facility for subroutine call and return

Design of control Unit:After getting the microoperations we have to execute these microperations but before that we need to decode them.

Fig: Decoding of microoperation fields.Because we have 8 microoperations represented with the help of 3 bits in every table and also we have 3 such tables possible we have decoded these microperations field bits with three 3*8 decoders. After getting the microoperations, we have to give it to particular circuits, the data manipulation type of microperations like AND, ADD, Sub and so on we give to ALU and

microoperation fields

3 x 8 decoder

6 5 4 3 2 1 0

F1

3 x 8 decoder

7 6 5 4 3 2 1 0

F2

3 x 8 decoder

7 6 5 4 3 2 1 0

F3

Arithmeticlogic andshift unit

ANDADD

DRTAC

ACLoad

FromPC

FromDR(0-10)

Select 0 1Multiplexers

ARLoad Clock

AC

DR

DR

TARP C T A R

61

the corresponding results moved to AC. The ALU has been provided data from AC and DR.And for data transfer type of instructions like in the case of PCTAR or DRTAR we need to simply transfer the values .Because we have two options for data transfer in AR we are taking the help of MUX to choose one . We will take 2*1 MUX and one select line which is attached with DRTAR microperation signal .That means if DRTAR is high then MUX will choose DR to transfer the data to AR else PC ‘s data will be moved to AR.And the corresponding data movement will be done with the help of load high or not .If any of the values is high the value will be loaded to AR.

The clock signal is provided for the synchronization of microoperations.

62

Lecture – 13:

Fetch and decode cycle Control Unit

Fetch and Decode

T0: AR PC (S0S1S2=010, T0=1)T1: IR M [AR], PC PC + 1 (S0S1S2=111, T1=1)T2: D0, . . . , D7 Decode IR(12-14), AR IR(0-11), I IR(15)

S2

S1

S0

Bus

7Memory

unitAddress

Read

AR

LD

PC

INR

IR

LD Clock

1

2

5

Common bus

T1

T0

63

Control Unit

• Control unit (CU) of a processor translates from machine instructions to the control signals for the microoperations that implement them

• Control units are implemented in one of two ways• Hardwired Control

– CU is made up of sequential and combinational circuits to generate the control signals

• Microprogrammed Control– A control memory on the processor contains microprograms that

activate the necessary control signals

• We will consider a hardwired implementation of the control unit for the Basic Computer

Fetch and Decode

T0: AR PC (S0S1S2=010, T0=1)T1: IR M [AR], PC PC + 1 (S0S1S2=111, T1=1)T2: D0, . . . , D7 Decode IR(12-14), AR IR(0-11), I IR(15)

64

Control Unit

• Control unit (CU) of a processor translates from machine instructions to the control signals for the microoperations that implement them

• Control units are implemented in one of two ways• Hardwired Control

– CU is made up of sequential and combinational circuits to generate the control signals

• Microprogrammed Control– A control memory on the processor contains microprograms that

activate the necessary control signals

• We will consider a hardwired implementation of the control unit for the Basic Computer

S2

S1

S0

Bus

7Memory

unitAddress

Read

AR

LD

PC

INR

IR

LD Clock

1

2

5

Common bus

T1

T0

65

Lecture – 15:

• Memory hierarchy and its organization• Need of memory hierarchy• Locality of reference principle

In the last units we have studied the various instructions , data and the registers associated with our computer organization.Lets come on to micro architecture of computer , in which an important part is memory. Lets study what is a memory and what are the various types of memory available.

Memory unit is a very essential component in a computer which is used for storing programs and data. We use main memory for running programs and also additional capacity for storage . We have various levels of memory units in terms of memory hierarchy.

MEMORY HIERARCHY

Memory Hierarchy is to obtain the highest possible access speed while minimizing the total cost of the memory system

The various components are:

Main Memory: The memory unit that communicates directly with CPU. The programs and data currently needed by the processor reside in main memory.

Auxiliary Memory : This is made of devices that provide backup storage. Example : Magnetic tapes , magnetic disks etc.

Cache memory : This is the memory which lies in between your main memory and CPU.

]

Magnetictapes

Magneticdisks

I/Oprocessor

CPU

Mainmemory

Cachememory

66

Fig :Memory Hierarchy

In this hierarchy , we have magnetic tapes at the lowest level which means they are very slow and very cheap in nature. Moving on to upper levels , we have main memory in which we get increased speed but with increased cost per bit.

Thus we can conclude as we go towards upper levels:- Price increases - Speed increases- Cost per bit increases- Access time decreases- Size decreases

Many operating systems are designed to enable the CPU to process a number of independent programs concurrently. This concept is called multiprogramming.This is made possible by the existence of 2 programs residing in different pats of memory hierarchy at the same time . Example : CPU and I/O transfer.

The locality of reference, also known as the locality principle, is the phenomenon, that the collection of the data locations referenced in a short period of time in a running computer, often consists of relatively well predictable clusters.

Analysis of a large number of typical programs has shown that the references to memory at any given interval of time tend to be confined within a few localized areas in memory. This phenomenon is known as locality of reference

Register

Cache

Main Memory

Magnetic Disk

Magnetic Tape

67

Important special cases of locality are temporal, spatial, equidistant and branch locality.

Temporal locality: if at one point in time a particular memory location is referenced, then it is likely that the same location will be referenced again in the near future. There is a temporal proximity between the adjacent references to the same memory location. In this case it is common to make efforts to store a copy of the referenced data in special memory storage, which can be accessed faster. Temporal locality is a very special case of the spatial locality, namely when the prospective location is identical to the present location.

Spatial locality: if a particular memory location is referenced at a particular time, then it is likely that nearby memory locations will be referenced in the near future. There is a spatial proximity between the memory locations, referenced at almost the same time. In this case it is common to make efforts to guess, how big neighbourhood around the current reference is worthwhile to prepare for faster access.

Equidistant locality: it is halfway between the spatial locality and the branch locality. Consider a loop accessing locations in an equidistant pattern, i.e. the path in the spatial-temporal coordinate space is a dotted line. In this case, a simple linear function can predict which location will be accessed in the near future.

Branch locality: if there are only few amount of possible alternatives for the prospective part of the path in the spatial-temporal coordinate space. This is the case when an instruction loop has a simple structure, or the possible outcome of a small system of conditional branching instructions is restricted to a small set of possibilities. Branch locality is typically not a spatial locality since the few possibilities can be located far away from each other.

Sequential locality:In a typical program the execution of instructions follows a sequential order unless branch instructions create out of order execution. This also take into consideration spatial locality as the sequential instructions are stored near to each other.

In order to make benefit from the very frequently occurring temporal and spatial kind of locality, most of the information storage systems are hierarchical. The equidistant locality is usually supported by the diverse nontrivial increment instructions of the processors. For the case of branch locality, the contemporary processors have sophisticated branch predictors, and on the base of this prediction the memory manager of the processor tries to collect and preprocess the data of the plausible alternatives.

Reasons for locality

There are several reasons for locality. These reasons are either goals to achieve or circumstances to accept, depending on the aspect. The reasons below are not disjoint; in fact, the list below goes from the most general case to special cases.

68

Predictability: In fact, locality is merely one type of predictable behavior in computer systems. Luckily, many of the practical problems are decidable and hence the corresponding program can behave predictably, if it is well written.

Structure of the program: Locality occurs often because of the way in which computer programs are created, for handling decidable problems. Generally, related data is stored in nearby locations in storage. One common pattern in computing involves the processing of several items, one at a time. This means that if a lot of processing is done, the single item will be accessed more than once, thus leading to temporal locality of reference. Furthermore, moving to the next item implies that the next item will be read, hence spatial locality of reference, since memory locations are typically read in batches.

Linear data structures: Locality often occurs because code contains loops that tend to reference arrays or other data structures by indices. Sequential locality, a special case of spatial locality, occurs when relevant data elements are arranged and accessed linearly. For example, the simple traversal of elements in a one-dimensional array, from the base address to the highest element would exploit the sequential locality of the array in memory.[2] The more general equidistant locality occurs when the linear traversal is over a longer area of adjacent data structures having identical structure and size, and in addition to this, not the whole structures are in access, but only the mutually corresponding same elements of the structures. This is the case when a matrix is represented as an sequential matrix of rows and the requirement is to access a single column of the matrix.

Use of locality in general

If most of the time the substantial portion of the references aggregate into clusters, and if the shape of this system of clusters can be well predicted, then it can be used for speed optimization. There are several ways to make benefit from locality. The common techniques for optimization are:

to increase the locality of references. This is achieved usually on the software side.

to exploit the locality of references. This is achieved usually on the hardware side. The temporal and spatial locality can be capitalized by hierarchical storage hardwares. The equidistant locality can be used by the appropriately specialized instructions of the processors, this possibility is not only the responsibility of hardware, but the software as well, whether its structure is suitable for compiling a binary program which calls the specialized instructions in question. The branch locality is a more elaborate possibility, hence more developing effort is needed, but there is much larger reserve for future exploration in this kind of locality than in all the remaining ones.

69

http://en.wikipedia.org/wiki/Optimization_(computer_science)

http://en.wikipedia.org/wiki/Matrix_(mathematics)

http://en.wikipedia.org/wiki/Data_structure

http://en.wikipedia.org/wiki/Data_structure

http://en.wikipedia.org/wiki/Locality_of_reference#cite_note-1

http://en.wikipedia.org/wiki/Computing

http://en.wikipedia.org/wiki/Decidable


http://en.wikipedia.org/wiki/Predictability

http://en.wikipedia.org/wiki/Decidable

http://en.wikipedia.org/wiki/Predictability

Lecture – 16:

Main Memoryo RAM chip organizationo ROM chip organization

Expansion of main memoryo Memory connections to CPUo Memory address map

Till now we have discussed the memory interconnections and their comparisons.Lets take each in detail.

Main Memory: Main memory is a large (w.r.t Cache Memory ) and fast memory (w.r.t magnetic tapes , disks etc) used to store the programs and data during the computer operation. I/O processor manages data transfers between auxiliary memory and main memory.

Main Memory is available in 2 types :

The principal technology used for main memory is based on semiconductor integrated circuits.RAM : This is part of main memory where we can both read and write data.

Typical RAM chip:

CS1 and CS2 are used to enable or disable a particular RAM..

We have corresponding truth table as:

Chip select 1Chip select 2

ReadWrite

7-bit address

CS1CS2RDWRAD 7

128 x 8RAM

8-bit data bus

70

We have RAM enabled when CS1 as 1 and CS2 as 0.Else we will have inhibit operation or high impedence state. When we have 1 and 0 we will have RAM enabled.But if we have both read and write as 0 we don’t have any operation and thus RAM is in high impedence state .RD pin tells us that RAM is getting used fro read operation.Similarly WR pin is used to show that Write operation is getting performed on RAM.In this if we have option of both WR and RD as high we choose read operation else we will have inconsistency of data.

Since we have 128 * 8 words RAM that means we have 128 words and each word of length 8 bits.

Thus we need 8 bit data bus to transfer the data and we have bidirection 8 bit data bus .To access 128 words we need 27 i.e. 7 bits to access 128 words.

Integrated circuit RAM chips are available in 2 modes :Static memory:Dynamic Memory:

We have ROM enabled when CS1 as 1 and CS2 as 0.We need not have any WR pin as ROM does not allow write operation. Also we do not need RD pin because if ROM is enabled it will be for read operation only.

Since we have 512* 8 words ROM that means we have 512 words and each word of length 8 bits.

Thus we need 8 bit data bus to transfer the data but unidirectional as it only allows

reading. To access 512 words we need 29 i.e. 9 bits.

Typical ROM chip

Chip select 1Chip select 2

9-bit address

CS1

CS2

AD 9

512 x 8ROM

8-bit data bus

71

Memory Expansion:

Sometimes we need to combine RAMs or ROMs to expand memory. Taking a similar case we need 512 words of RAM with 128 words RAM chip and also we need 512 ROM memory.

In this we will have 4 RAMs of 128 each and one 512 ROM.

To access a particular word of memory we have to go in 3 steps:1. To access a particular word in 128 words RAM ( we need 7 bits for one

RAM) or a particular word in 512 words ROM ( we need 9 bits for ROM).2. To choose a particular RAM out of four we need 2 bits and thus we need a

2*4 decoder for it.3. To choose between RAM or ROM we need 1 more bit to choose one out of 2.

To show these type of connections to the CPU we have the following circuit: We have 4 RAMs and one ROM. The address lines from 1 – 7 is given to all RAMs. The address lines from 1 – 9 to ROM. 2 bits i.e.8th and 9th are used to access 4 RAMs thus have used a 2*4 decoder. To identify between RAM and ROM we used one more bit i.e. 10th bit. That

means if we have value of 10th bit as low RAM will be enabled else ROM will be enabled.

72

Fig: Memory connection to CPU

To represent them properly we take the help of memory address map:

}

CS1CS2RDWRAD7

128 x 8RAM 1

CS1CS2RDWRAD7

128 x 8RAM 2

CS1CS2RDWRAD7

128 x 8RAM 3

CS1CS2RDWRAD7

128 x 8RAM 4

Decoder3 2 1 0

WRRD9 8 7-11016-11Address bus

Data bus

CPU

CS1CS2

512 x 8ROMAD9

1- 7

98

Dat

aD

ata

Dat

aD

ata

D at a

RAM 1RAM 2RAM 3RAM 4ROM

0000 - 007F0080 - 00FF0100 - 017F0180 - 01FF0200 - 03FF

ComponentHexadecimal

address

0 0 0 x x x x x x x0 0 1 x x x x x x x0 1 0 x x x x x x x0 1 1 x x x x x x x1 x x x x x x x x x

10 9 8 7 6 5 4 3 2 1

Address bus

73

We have used x or don’t care for 1-7 bits of RAM and 1-9 for ROM as any value whether 0 or 1 it will lie in that particular RAM address only.

2 bits i.e.8th and 9th are used to access 4 RAMs i.e. foro RAM1 – 0 0o RAM2 – 0 1o RAM3 – 1 0o RAM4 – 1 1

To identify between RAM and ROM we used one more bit i.e. 10th bit. That means if we have value of 10th bit as low RAM will be enabled else ROM will be enabled.

74

Lecture – 17:

Static RAM nd Dynamic RAM Associative Memory

In last lecture we discussed the RAM and ROM chips and their expansion mechanisms. Lets discuss the various types of RAM .

Static Random Access Memory (SRAM) is a type of semiconductor memory where the word static indicates that, unlike dynamic RAM (DRAM), it does not need to be periodically refreshed, as SRAM uses bistable latching circuitry to store each bit. SRAM exhibits data remanence,[1] but is still volatile in the conventional sense that data is eventually lost when the memory is not powered.

Dynamic random access memory (DRAM) is a type of random access memory that stores each bit of data in a separate capacitor within an integrated circuit. Since real capacitors leak charge, the information eventually fades unless the capacitor charge is refreshed periodically. Because of this refresh requirement, it is a dynamic memory as opposed to SRAM and other static memory.

The advantage of DRAM is its structural simplicity: only one transistor and a capacitor are required per bit, compared to six transistors in SRAM. This allows DRAM to reach very high density. Unlike Flash memory, it is volatile memory (cf. non-volatile memory), since it loses its data when the power supply is removed.

Static RAM is a type of RAM that holds its data without external refresh, for as long as power is supplied to the circuit. This is contrasted to dynamic RAM

(DRAM), which must be refreshed many times per second in order to hold its data contents. SRAMs are used for specific applications within the PC, where their strengths outweigh their weaknesses compared to DRAM:

Simplicity: SRAMs don't require external refresh circuitry or other work in order for them to keep their data intact.

Speed: SRAM is faster than DRAM.

In contrast, SRAMs have the following weaknesses, compared to DRAMs:

Cost: SRAM is, byte for byte, several times more expensive than DRAM. Size: SRAMs take up much more space than DRAMs (which is part of why the

cost is higher).

75

http://www.pcguide.com/ref/ram/typesSRAM-c.html

http://www.pcguide.com/ref/ram/typesSRAM-c.html

http://en.wikipedia.org/wiki/Non-volatile_memory

http://en.wikipedia.org/wiki/Volatile_memory

http://en.wikipedia.org/wiki/Flash_memory

http://en.wikipedia.org/wiki/Computer_storage_density

http://en.wikipedia.org/wiki/Static_random_access_memory

http://en.wikipedia.org/wiki/Memory_refresh

http://en.wikipedia.org/wiki/Integrated_circuit

http://en.wikipedia.org/wiki/Capacitor

http://en.wikipedia.org/wiki/Bit

http://en.wikipedia.org/wiki/Random_access_memory

http://en.wikipedia.org/wiki/Volatile_memory

http://en.wikipedia.org/wiki/Static_random_access_memory#cite_note-skorobogatov-0

http://en.wikipedia.org/wiki/Data_remanence

http://en.wikipedia.org/wiki/Latch_(electronics)

http://en.wikipedia.org/wiki/Bistable

http://en.wikipedia.org/wiki/Memory_refresh

http://en.wikipedia.org/wiki/Dynamic_random_access_memory

http://en.wikipedia.org/wiki/Semiconductor

We access our memory by some address but we can get some chance to access memory by the content value and not the address . For example the search mechanisms. So their comes the concept of associative memory.

Associative Memory: This is the type of memory where we access data by searching or matching the contents and not by address value.

- Accessed by the content of the data rather than by an address- Also called Content Addressable Memory (CAM)

In this the data we need to search is kept in Argument register. This is of the same length as of word size. Since we have m words of nbits length we will have argument register of length n bits. Also we have M as match register which gives us the matching result in terms of those particular bits high. This is equal to the number of words in memory, so match register is of length m bits.

For example if we have to search 1011 and you have words Like

Argument register(A)

Key register (K)

Associative memoryarray and logic

m wordsn bits per word

Matchregister

Input

Read

Write

M

76

10110111100011000010101101111011

In this we have occurrence of 1011 , 3 times i.e for 1st , 6th and 8th word. Thus the value of match register will be high at 1st , 6th and 8th place else it will be low.

In this the value of key register is 1111 which represents that it is matching all the bits of argument register to every word of associative memory.In case we need to choose only some bits for checking as we want all words ending with one .

The value of key register will be 0001( as we are matching only last bit)And the value of match register will be 11000111.

77

1011

10110111100011000010101101111011

1 0 0 0 0 1 0 1

1111

Lecture – 18:

Cache Memory:o Locality of reference o Associative mapped cache organizationso Direct mapped cache organizationso Set associative mapped cache organizations

Cache Memory: The basic idea of cache organization is that by keeping the most frequently accessed instructions and data in the fast memory i.e. cache memory , the average memory access time is reduced. Examples are : Important subprograms , iterative procedures etc. If these active portions of the program and data are placed in fast small memory , the average access time can be reduced , thus reducing the total execution time of the program. Such a fast small memory is referred to as a cache memory.This is placed between main memory and CPU.When the CPU needs to access memory , the cache is examined.If the word is found in the cache , it is read from this fast memory and is called a cache hit.Else we access the main memory and this is called a cache miss.The performance of cache memory is frequently measured in terms of hit ratio.Analysis of a large number of typical programs has shown that the references to memory at any given interval of time tend to be confined within a few localized areas in memory. This phenomenon is known as locality of reference .

The various type of concepts used for locality of reference are: - Temporal Locality The information which will be used in near future is likely to be in use already( e.g. Reuse of information in loops) - Spatial Locality If a word is accessed, adjacent(near) words are likely accessed soon

(e.g. Related data items (arrays) are usually stored together; instructions are executed sequentially)

-Sequential localityIn a typical program the execution of instructions follows a sequential orderunless branch instructions create out of order execution. This also take into consideration spatial locality as the sequential instructions are stored near to each other.

Performance of Cache Memory System

Cache memoryCPU

78

Main memory

Hit Ratio - % of memory accesses satisfied by Cache memory system Te: Effective memory access time in Cache memory system Tc: Cache access time Tm: Main memory access time

Te = Tc + (1 - h) Tm

Example: Tc = 0.4 s, Tm = 1.2s, h = 0.85% Te = 0.4 + (1 - 0.85) * 1.2 = 0.58s

MEMORY AND CACHE MAPPING – Mapping Function:

Specification of correspondence between main memory blocks and cache blocksThere are 3 type of mapping mechanisms

Associative mapping Direct mapping Set-associative mapping

ASSOCIATIVE MAPPING –- Any block location in Cache can store any block in memory -> Most flexible ( we can place any word of any address value anywhere)- Mapping Table is implemented in an associative memory -> Fast, very Expensive (The associative logic device is expensive . Also word size increases by number of bits in the address of main memory)- Mapping Table

Stores both address and the content of the memory word

79

In this we have fetched these important words and place

into cache memory but since we get only address values to fetch the data we need to place this address too in our cache memory. But in cache memory this is saved as the content value. Thus we have to search this address in forms of content taking the concept of associative memory. That is why the address we need to fetch we place it in argument register and searches it in our cache memory .When we find it the corresponding data is fetched back.

DIRECT MAPPING-

- Each memory block has only one place to load in Cache- Mapping Table is made of RAM instead of CAM- n-bit memory address consists of 2 parts; k bits of Index field and n-k bits of Tag field- n-bit addresses are used to access main memory and k-bit Index is used to access the Cache

Addressing Relationships

Memoryaddress Memory data

00000 1 2 2 0

0077701000

0177702000

02777

2 3 4 03 4 5 0

4 5 6 05 6 7 0

6 7 1 0

address (15 bits)

Argument register

Address Data

0 1 0 0 00 2 7 7 72 2 2 3 5

3 4 5 06 7 1 01 2 3 4

CAM

80

In this we divide the address of main memory to 2 fields : Index: No of bits equal to bits required to access cache memory.Tag : Total – Index bits.

The reason of this division is we place the content of main memory onto the cache memory address equal to index bits.Example : In main memory we have 1220 at address 00000 , by dividing this address we get Tag as 00 and index as 000. Thus we save 1220 at address 000 of cache memory. But we also have data 01000 which has also index 000 . So to distinguish between them we save the tag values along with data in cache memory.Similarly 2340 is saved at 00777 in main memory will be saved at 777 address in cache memory with tag value as 00 and so on.

Direct mapping cache organization

Problem: But in this case we cannot save data of both address 00000, 1000 and 02000 they have same index value, we have to replace one to store other. Similarly we cannot save 00777 and 01777.That means even we have free words in our memory we cannot save the words with the same index.

Tag(6) Index(9)

32K x 12

Main memoryAddress = 15 bitsData = 12 bits

512 x 12

Address = 9 bitsData = 12 bits

00 000

77 777

000

777

Memoryaddress Memory data

00000 1 2 2 0

0077701000

0177702000

02777

2 3 4 03 4 5 0

4 5 6 05 6 7 0

6 7 1 0

Indexaddress Tag Data

000 0 0 1 2 2 0

0 2 6 7 1 0777

Cache memory

81

Cache memory

But this is relatively less expensive and also the word size is smaller to associative type of organization.

To avoid problems of both direct and associative type of organizations we take the 3rd

concept i.e. Set-associative mapping.

In this we can save more than one data value with same index. And we also save both the tags corresponding to data values.

Index Tag Data

000 0 1 3 4 5 0 0 2 5 6 7 0

Tag Data

777 0 2 6 7 1 0 0 0 2 3 4 0

82

Lecture – 17,18,19 and 20:

Cache to memory writeo Write back policyo Write through policy

Cache Coherenceo Software precautionso Snoopy controller

In last lecture we studied that on the basis of locality principle important or repetitive data is placed in cache memory. That means we can make any kind of data change in cache memory but since we have one copy of main memory too we need to make changes in main memory also.Thus in this lecture we will study how this updation is done and what are the various problems we face .

Cache to memory write: Once we access data in cache memory and we make changes to it , we need to reflect this changes to main memory too. Thus we adopt two policies for it.

Write Through

When writing into memory

If Hit, both Cache and memory is written in parallel If Miss, Memory is written For a read miss, missing block may be overloaded onto a cache block

Memory is always updated -> Important when CPU and DMA I/O are both executing

Slow, due to the memory access time

Write-Back (Copy-Back)

When writing into memory

o If Hit, only Cache is writteno If Miss, missing block is brought to Cache and write into Cache

For a read miss, candidate block must be written back to the memory

Memory is not up-to-date, i.e., the same item in Cache and memory may have different value

83

The mechanism is opted depending on 2 important scenarios like :1. How frequent are the changes in cache memory.2. And is main memory is used by some other cache memory too.

The case is possible in multiprocessor mechanism , when all the processors use the same main memory and have their own separate cache memories.

There is one part of main memory containing the value of X is shared by all the processors ‘ cache memories.Thus either we use write through or write back policy we will find issues in it and we will have inconsistency of data which is known as cache coherence.Cache coherence:

Maintaining Cache Coherency

• Shared Cache– Disallow private cache– Access time delay

• Software Approaches– Read-Only Data are Cacheable• Private Cache is for Read-Only data• Shared Writable Data are not cacheable• Compiler tags data as cacheable and non-cacheable• Degrade performance due to software overhead

84

– Centralized Global Table• Status of each memory block is maintained in CGT: RO(Read-Only); RW(Read and Write)• All caches can have copies of RO blocks• Only one cache can have a copy of RW block

• Hardware Approaches– Snoopy Cache Controller• Cache Controllers monitor all the bus requests from CPUs and IOPs• All caches attached to the bus monitor the write operations• When a word in a cache is written, memory is also updated (writethrough)• Local snoopy controllers in all other caches check their memory todetermine if they have a copy of that word; If they have, thatlocation is marked invalid(future reference to this location causescache miss)

85

Lecture – 20:

Goals of parallelismo Segmentation of processor in functional units

Amdahl’s law

Parallel Processing

In last unit we discussed various type of instructions and microinstructions generated but here we discuss how we can select these instructions for parallel processing.

Parallel processing is the term used for simultaneous execution of 2 or more instructions.

Various levels of Parallel Processing are: - Job or Program level- Task or Procedure level- Inter-Instruction level- Intra-Instruction level

First type of parallelism is implemented by increasing the number of processors. The classification can be on basis of either instruction or data which has been classified as

SISDSIMDMISDMIMD

Given by M.J.Flynn.

The other technique what we use is that instead of using more than one processor , we can divide the work into various processing unit which is called as segmentation of processor into various processing units.

Execution of Concurrent Events in the computing process to Achieve faster Computational Speed

86

Fig: Processor with multiple functional units

Through this we save the cost of adding more and more processor. We have divided one processor into various functional units , so that they can work simultaneously on different type of instructions.That means one instruction requiring shift operation can run simultaneously with instruction requiring addition on a single processor.Another way to improve performance is through pipelining. Piepelining is a technique of decomposing a sequential process into suboperations , with each subprocess being executed in a special dedicated segment that operates concurrently with all other segments.

Either we add number of processors or we segment the processor into functional units , the speed up we achieve depends on the percentage of parallel execution possible. This concept gave birth to Amdahl’s law.

Adder/Subtractor

Multiplier/Divisor

Logic Unit

Shift Unit

Incrementer

Floating point multiply

Floating point Add / subtract

Floating point divide

Processor Register

To Memory

87

Amdahl's law, also known as Amdahl's argument is named after computer architect Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors.The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. For example, if a program needs 20 hours using a single processor core, and a particular portion of 1 hour cannot be parallelized, while the remaining promising portion of 19 hours (95%) can be parallelized, then regardless of how many processors we devote to a parallelized execution of this program, the minimal execution time cannot be less than that critical 1 hour. Hence the speed up is limited up to 20x, as shown in the diagram on the right.

Thus Gene Amdahl in his 1967 paper titled “validity of the single processor approach to achieve large scale computing capabilities” has main statement as :

If F is the fraction of the calculation that is sequential and (1 – F) is the fraction that can be parallelized , then the maximum speed up that can be achieved by using P processors is :

1/(F+ (1-F)/P)

For example :

1.If 50 % is the portion which can be paralelised addng 1 more processor will make a difference of speed up of only 1.333 times rather than twice.

1/(.50+.50/2) Similarly if we add 4 more processor the speed up will be only 1.667 times.2. If we take into consideration 75% of the portion that can be parallelized then the speed up by adding one more processor is 1.6 times which is more than 1.33 times in case of 50% parallel portion.

Percent Parallel execution

0 50 75 90 95 100

Number of processors2 1 1.33 1.6 1.82 1.9 25 1 1.667 2.4 3.57 4.17 510 1 1.81 3.08 5.26 6.9 10100 1 1.98 3.88 9.17 16.8 100Thus we can deduce that speed up depends directly on the percentage of parallel portion and number of processors . But to a certain limit. And the extreme cases what we can discuss are :

In case of 0% parallel portion , whatever the number of processors are there is no increase in speed up.

88

In case of 100% parallel portion , the maximum speedup is equal to the number of processors used.

89

Lecture – 21:

Pipelining or Pipeline processingo Exampleo Data table

Another way to improve performance is through pipelining. Piepelining is a technique of decomposing a sequential process into suboperations , with each subprocess being executed in a special dedicated segment that operates concurrently with all other segments.Example : In this we have the instruction as A* B + C and we have to execute it for various data values . So we can represent it as :

In this we have divided the steps of execution of this instruction into various steps (as segments) as we work in one segment , we don’t leave other segment idle. We work on 2 segments simultaneously but for different data values . This helps us in way as if we execute the data sequentially we will take clock pulse 7* 3 = 21 pulses( considering 1 pulse each for one segment) . To decrease the clock pulses we take the help of pipelining.

.

Ai

R1 R2

Multiplier

R3 R4

Adder

R5

MemoryBi Ci

Segment 1

Segment 2

Segment 3

90

Ai * Bi + Ci for i = 1, 2, 3, ... , 7

This is implemented as : Step 1: In first clock pulse R1 is loaded with the first value of A as A1 and similarly R2 is loaded with B1.Step 2: In second clock pulse values of R1 and R2 will be given to multiplier and C1 is loaded into R3.The values of R1 and R2 are free we can load the values of A2 and B2 into R1 and R2. That means in pulse 2 we have both segment 1 and segment 2 in working state.

Step 3 : In third clock pulse , the values of multiplier and R3 with value C1 is given to adder. Thus multiplier and R3 will be free so we can multiply R1 and R2 with value A2 and B2 . But now even segment is free , so we take the values of A3 and B3 into R1 and R2 .

Similarly in next stepIn segment 1 : R1 and R2 will take values of A4 and B4.In segment 2 : R1 and R2 are multiplied with R3 loaded by C3.In segment 3 : Adder is working on A2 , B2 and C2.

Data table:

So the calculation of clock pulse in case of sequential access is No of steps * No of data streams

is replaced with no of steps + no of data streams -1.

ClockPulse

Segment 1 Segment 2 Segment 3

Number R1 R2 R3 R4 R5 1 A1 B1 ------ ----- ------------- 2 A2 B2 A1 * B1 C1 ------------- 3 A3 B3 A2 * B2 C2 A1 * B1 + C1 4 A4 B4 A3 * B3 C3 A2 * B2 + C2 5 A5 B5 A4 * B4 C4 A3 * B3 + C3 6 A6 B6 A5 * B5 C5 A4 * B4 + C4 7 A7 B7 A6 * B6 C6 A5 * B5 + C5 8 ------------ A7 * B7 C7 A6 * B6 + C6 9 ------------ --------- ----- A7 * B7 + C7

91

Lecture – 22:

Instruction level parallelismo Instruction stepso Exampleo Flowchart

Pipelining hazards

Instruction PipeliningThe pipelining concept we discussed in last chapter was an example of SIMD ( single instruction on various data values ) . We can also segment the steps of instruction and execute them with the help of pipelining.This phenomenon is known as instruction pipelining or instruction level parallelism..

The steps of a particular instruction are:

[1] Fetch an instruction from memory[2] Decode the instruction[3] Calculate the effective address of the operand[4] Fetch the operands from memory[5] Execute the operation[6] Store the result in the proper place

* Some instructions skip some phases* Effective address calculation can be done in the part of the decoding phase* Storage of the operation result into a register is done automatically in the execution phase

==> 4-Stage Pipeline

[1] FI: Fetch an instruction from memory[2] DA: Decode the instruction and calculate the effective address of the operand[3] FO: Fetch the operand[4] EX: Execute the operation

Say we have 3 instructions and if we execute them sequentially we will get a scenario as :

FI DA FO EX

FI DA FO EX

FI DA FO EX

i

i+1

i+2

92

But if we use pipelining the scenario will be

But it has some exceptions as, in case of branching and interrupts. Lets discuss it with the help of a flowchart.

In this we have instruction pipelining with sequential execution till instruction 3 . But the 3rd instruction is decoded and we get to know that its branching to some other address . SO the fourth instruction which is fetched is not the next instruction to be executed. So we don’t take it further and wait till the 3rd instruction executed so that we get the address to be executed next and that called one becomes 4th instruction. And the pipelining continues.

FI DA FO EX

FI DA FO EX

FI DA FO EX

i

i+1

i+2

1 2 3 4 5 6 7 8 9 10

12

13

11FI DA FO EX1

FI DA FO EX

FI DA FO EX

FI DA FO EX

FI DA FO EX

FI DA FO EX

FI DA FO

2

3

4

5

6

7

FI

Step:Instruction

(Branch)

93

EX

Fetch instructionfrom memory

Decode instructionand calculate

Effective address

Branch?

Fetch operandfrom memory

Execute instruction

Interrupt?Interrupthandling

Update PC

Empty pipe

no

yes

yesno

Segment1:

Segment2:

Segment3:

Segment4:

94

Limitations of Pipelining / Pipelining hazardsThere are various advantages or uses of pipelining. But there are some problem areas what we face in case of pipelining.

Major Hazards we face in Pipelined Execution are:

Structural Hazards

Occur when some resource has not been duplicated enough to allow all combinations of instructions in the pipeline to execute.Example: With one memory-port, a data and an instruction fetch cannot be initiated in the same clock

The Pipeline is stalled for a structural hazard<- Two Loads with one port memory

FI DA FO EXi

i+1

i+2

FI DA FO EX

FI DA FO EXstallstall

95

-> Two-port memory will serve without stall

Data Hazards: Occurs when the execution of an instruction depends on the results of a previous instructionADD R1, R2, R3SUB R4, R1, R5Data hazard can be dealt with either HW techniques or SW technique

HW Technique- Interlock- hardware detects the data dependencies and delays the schedulingof the dependent instruction by stalling enough clock cycles- (Operand) Forwarding (bypassing, short-circuiting)- Accomplished by a data path that routes a value from a source(usually an ALU) to a user, bypassing a designated register. Thisallows the value to be produced to be used at an earlier stage in thepipeline than would otherwise be possibleSW Technique- Instruction Scheduling (compiler) for delayed load

Control Hazards

Prefetch Target Instruction– Fetch instructions in both streams, branch not taken and branch taken– Both are saved until branch is executed. Then, select the rightinstruction stream and discard the wrong stream

96

Branch Target Buffer (BTB; Associative Memory)– Entry: Addr of previously executed branches; Target instructionand the next few instructions– When fetching an instruction, search BTB.– If found, fetch the instruction stream in BTB;– If not, new stream is fetched and update BTB

Loop Buffer (High Speed Register file)– Storage of entire loop that allows to execute a loopwithout accessing memoryBranch Prediction– Guessing the branch condition, and fetch an instruction stream based onthe guess. Correct guess eliminates the branch penalty

Delayed Branch– Compiler detects the branch and rearranges the instruction sequenceby inserting useful instructions that keep the pipeline busyin the presence of a branch instruction

97

Lecture – 23: Vector Processors Super computers Memory Interleaving Array Processors

o SIMD array processoro Attached array processor

There are various type of processors which perform particular operations.

Vector Processors:One more type of processor what we use are vector processors.Ability to process vectors, and related data structures such as matrices and multi-dimensional arrays, much faster than conventional computers Vector Processing Applications• Problems that can be efficiently formulated in terms of vectors– Long-range weather forecasting– Petroleum explorations– Seismic data analysis– Medical diagnosis– Aerodynamics and space flight simulations– Artificial intelligence and expert systems– Mapping the human genome– Image processing

Vector Processor (computer)Ability to process vectors, and related data structures such as matricesand multi-dimensional arrays, much faster than conventional computers

Vector Processors may also be pipelinedExample:

98

Supercomputers:Is a broad term for one of the fastest computers currently available. Such computers are typically used for number crunching including scientific simulations, (animated) graphics, analysis of geological data (e.g. in petrochemical prospecting), structural analysis, computational fluid dynamics, physics, chemistry, electronic design, nuclear energy research and meteorology. Perhaps the best known supercomputer manufacturer is Cray Research.The chief difference between a supercomputer and a mainframe is that a supercomputer channels all its power into executing a few programs as fast as possible, whereas a mainframe uses its power to execute many programs concurrently.A supercomputer is a computer that leads the world in terms of processing capacity, particularly speed of calculation, at the time of its introduction. The first supercomputers were introduced in the 1960s, led primarily by Seymour Cray at Control Data Corporation (CDC), which led the market into the 1970s until Cray split off to form his own company, Cray Research, and then took over the market. In the 1980s a large number of smaller competitors entered the market, a parallel to the creation of the minicomputer market a decade earlier, many of whom disappeared in the mid-1990s "supercomputer market crash". Today supercomputers are typically one-off custom designs produced by "traditional" companies such as IBM and HP, who had purchased many of the 1980s companies to gain their experience.

DO 20 I = 1, 10020 C(I) = B(I) + A(I)

Conventional computer

Initialize I = 020 Read A(I) Read B(I) Store C(I) = A(I) + B(I) Increment I = i + 1 If I 100 goto 20

Vector computer

C(1:100) = A(1:100) + B(1:100)

99

http://www.webopedia.com/TERM/S/program.html

http://www.webopedia.com/TERM/S/execute.html

http://www.webopedia.com/TERM/S/mainframe.html

http://www.cray.com/

Technologies developed for supercomputers include:

Vector processing : A vector processor, or array processor, is a CPU design where the instruction set includes operations that can perform mathematical operations on multiple data elements simultaneously. This is in contrast to a scalar processor which handles one element at a time using multiple instructions. The vast majority of CPUs are scalar (or close to it). Vector processors were common in the scientific computing area, where they formed the basis of most supercomputers through the 1980s and into the 1990s, but general increases in performance and processor design saw the near disappearance of the vector processor as a general-purpose CPU.

Liquid cooling : An uncommon practice is to submerse the computer's components in a thermally conductive liquid. Personal computers that are cooled in this manner do not generally require any fans or pumps, and may be cooled exclusively by passive heat exchange between the computer's parts, the cooling fluid and the ambient air.

Non-Uniform Memory Access (NUMA) : Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory design used in multiprocessors, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processor

Striped disks (the first instance of what was later called RAID): In computer data storage, data striping is the segmentation of logically sequential data, such as a single file, so that segments can be assigned to multiple physical devices (usually disk drives in the case of RAID storage, or network interfaces in the case of Grid-oriented Storage) in a round-robin fashion and thus written concurrently.

Parallel filesystems : n computing, a file system (often also written as filesystem) is a method for storing and organizing computer files and the data they contain to make it easy to find and access them. File systems may use a data storage device such as a hard disk or CD-ROM and involve maintaining the physical location of the files, they might provide access to data on a file server by acting as clients for a network protocol (e.g., NFS, SMB, or 9P clients), or they may be virtual and exist only as an access method for virtual data (e.g., procfs). It is distinguished from a directory service and registry.

Memory Interleaving

Also known as MULTIPLE MEMORY MODULE AND INTERLEAVING

100

http://en.wikipedia.org/wiki/Windows_registry

http://en.wikipedia.org/wiki/Directory_service

http://en.wikipedia.org/wiki/Procfs

http://en.wikipedia.org/wiki/9P

http://en.wikipedia.org/wiki/Server_Message_Block

http://en.wikipedia.org/wiki/Network_File_System_(protocol)

http://en.wikipedia.org/wiki/Network_protocol

http://en.wikipedia.org/wiki/CD-ROM

http://en.wikipedia.org/wiki/Hard_disk

http://en.wikipedia.org/wiki/Data_storage_device

http://en.wikipedia.org/wiki/Computer_file

http://en.wikipedia.org/wiki/Organize

http://en.wikipedia.org/wiki/Store

http://en.wikipedia.org/wiki/Computing

http://en.wikipedia.org/wiki/Filesystem

http://en.wikipedia.org/wiki/Round-robin

http://en.wikipedia.org/wiki/Grid-oriented_Storage

http://en.wikipedia.org/wiki/Grid-oriented_Storage

http://en.wikipedia.org/wiki/Network_interface

http://en.wikipedia.org/wiki/RAID

http://en.wikipedia.org/wiki/Disk_drive

http://en.wikipedia.org/wiki/Computer_data_storage

http://en.wikipedia.org/wiki/Computer_data_storage

http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks

http://en.wikipedia.org/wiki/Data_striping

http://en.wikipedia.org/wiki/Multiprocessor

http://en.wikipedia.org/wiki/Computer_storage

http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access

http://en.wikipedia.org/wiki/Thermal_conduction

http://en.wikipedia.org/wiki/Computer_cooling#Liquid_submersion_cooling

http://en.wikipedia.org/wiki/Supercomputer

http://en.wikipedia.org/wiki/Scientific_computing

http://en.wikipedia.org/wiki/Scalar_processor

http://en.wikipedia.org/wiki/Central_processing_unit

http://en.wikipedia.org/wiki/Vector_processing

Memory interleaving is the term used because we are combining or communicating the different memories for assigning addresses and again to interchange the data.

Array Processors:

A microprocessor that executes one instruction at a time but on an array or table of data at the same time rather than on single data elements.

• Array processor performs a single instruction in multiple execution units in the same clock cycle• The different execution units have same instruction using same set of vectors in the array.

Features of array proessor:

• Use of parallel execution units for processing different vectors of the arrays

• Use of memory interleaving, n memory address registers and n memory data registers in case of k pipelines and use of vector register files

A computer/processor that has an architecture especially designed for processing arrays (e.g. matrices) of numbers. The architecture includes a number of processors (say 64 by 64) working simultaneously, each handling one element of the array, so that a single operation can apply to all elements of the array in parallel. To obtain the same effect in a conventional processor, the operation must be applied to each element of the array sequentially, and so consequently much more slowly.

101

http://www.encyclopedia.com/doc/1O11-array.html

http://www.webopedia.com/TERM/A/array.html

http://www.webopedia.com/TERM/A/instruction.html

http://www.webopedia.com/TERM/A/microprocessor.html

An array processor may be built as a self-contained unit attached to a main computer via an I/O port or internal bus; alternatively, it may be a distributed array processor where the processing elements are distributed throughout, and closely linked to, a section of the computer's memory.

Array processors are very powerful tools for handling problems with a high degree of parallelism. They do however demand a modified approach to programming. The conversion of conventional (sequential) programs to serve array processors is not a trivial task, and it is sometimes necessary to select different (parallel) algorithms to suit the parallel approach.Array processors are most imortantly implemented in 2 ways:SIMD array processors: A SIMD array processor is a computer with multiple processingunits operating in parallel. The processing units are synchronized to perform the same operation under the control of a common control unit, thus providing a single instruction stream, multiple data stream organization.• Data level parallelism in array processor, for example, the multiplier unit pipelines are in parallel Computing x[i] × y[i] in number of parallel units.• It multifunctional units simultaneously perform the actions

Fig: SIMD Array Processor

Attached array processors:

The various components of this structure are:General purpose computer : Used for general procesingMain memory : Memory attached to general purpose computerI/O interface : To connect the two procesors.Attached array processor : The array processor required for high computations.Local memory: Attached to array processor

102

• The attached array processor has an input output interface to a common processor and another interface with a local memory• The local memory interconnects main memory

Fig: Attached Array Processor

103

Lecture – 24:

Instruction Codes Type of Instructions

o Memory reference type of Instructionso Register reference type of Instructionso I/O reference type of Instructions

Instruction Codes: In last topics, we studied the various types of organizations of our computer. Today , we will study the various types of instructions supported by our computer.

Before that lets see the cycle of an instruction.

• In Basic Computer, a machine instruction is executed in the following cycle:1. Fetch an instruction from memory2. Decode the instruction3. Read the effective address from memory if the instruction has an indirect

address4. Execute the instruction

• After an instruction is executed, the cycle starts again at step 1, for the next instruction

• Note: Every different processor has its own (different) instruction cycle

The Basic Computer Instruction Format we have is(OP-code = 000 ~ 110)

This is the type in which we refer memory to fetch our operands .Thus these type of instructions are called as memory reference instructions.

In this case I -is the mode field which tells us whether the technique to fetch the

operand is of type direct addressing or indirect type. i.e o I = 0 Direct addresso I = 1

Opcode – This is another field which is used to tell us the type of operation to be performed. Since this is of 3 bits, the maximum no of memory reference operations possible are 23 = 8.

The operations possible in memory reference type of instructions are:

104

15 14 12 11 0

I Opcode Address

Address – Address is the field which tells us the address on which we have to fetch the operand.

The effective address of the instruction is in AR and was placed there during timing signal T2 when I = 0, or during timing signal T3 when I = 1

- Memory cycle is assumed to be short enough to complete in a CPU cycle- The execution of MR instruction starts with T4

AND to ACD0T4: DR M[AR] Read operandD0T5: AC AC Ù DR, SC 0 AND with AC

ADD to ACD1T4: DR M[AR] Read operandD1T5: AC AC + DR, E Cout, SC 0Add to AC

and store carry in ELDA: Load to AC

D2T4: DR M[AR]D2T5: AC DR, SC 0

STA: Store ACD3T4: M[AR] AC, SC 0

BUN: Branch UnconditionallyD4T4: PC AR, SC 0

BSA: Branch and Save Return AddressM[AR] PC, PC AR + 1

SymbolOperationDecoder

Symbolic Description

AND D0 AC AC Ù M[AR]ADD D1 AC AC + M[AR], E Cout

LDA D2 AC M[AR]STA D3 M[AR] ACBUN D4 PC ARBSA D5 M[AR] PC, PC AR + 1ISZ D6 M[AR] M[AR] + 1, if M[AR] + 1 = 0 then PC PC+1

105

BSA: D5T4: M[AR] PC, AR AR + 1D5T5: PC AR, SC 0

ISZ: Increment and Skip-if-ZeroD6T4: DR M[AR]D6T5: DR DR + 1D6T4: M[AR] DR, if (DR = 0) then (PC PC + 1), SC 0

Memory, PC after execution

21

0 BSA 135

Next instruction

Subroutine

20

PC = 21

AR = 135

136

1 BUN 135

0 BSA 135

Next instruction

Subroutine

20

21

135

PC = 136

1 BUN 135

Memory Memory

106

Register – Reference Instructions

The instruction format to represent the register -reference type of instructions is:

(OP-code = 111, I = 0)

In this case last 4 bits are fixed i.e. 0111. And 1st 12 bits denotes the type of operation to be performed on operation. The 12 bits i.e. B0 to B11 represent individual instruction that has to be performed.

Memory-reference instruction

DR M[AR] DR M[AR] DR M[AR] M[AR] ACSC 0

AND ADD LDA STA

AC AC DRSC 0

AC AC + DRE CoutSC 0

AC DRSC 0

D T0 4 D T1 4 D T2 4 D T3 4

D T0 5 D T1 5 D T2 5

PC ARSC 0

M[AR] PCAR AR + 1

DR M[AR]

BUN BSA ISZ

D T4 4 D T5 4 D T6 4

DR DR + 1

D T5 5 D T6 5

PC ARSC 0

M[AR] DRIf (DR = 0)then (PC PC + 1)SC 0

D T6 6

Ù

15 12 11 0Register Operation OPerationoperationoperation

0 1 1 1

107

In these type of instructions the instructions itself tells us the operation and the register on which the operand has to be performed.

- D7 = 1, I = 0- Register Ref. Instr. is specified in B0 ~ B11 of IR- Execution starts with timing signal T3

r = D7 I¢T3 => Register Reference InstructionBi = IR(i) , i=0,1,2,...,11

Input-Output Instructions

The instruction format to represent input - output type of instructions is:(OP-code =111, I = 1)

In this case last 4 bits are fixed i.e. 1111. And 1st 12 bits denotes the type of operation to be performed on operation. The 12 bits i.e. B0 to B11 represent individual instruction that has to be performed.

r: SC 0CLA rB11: AC 0CLE rB10: E 0CMA rB9: AC AC’CME rB8: E E’CIR rB7: AC shr AC, AC(15) E, E AC(0)CIL rB6: AC shl AC, AC(0) E, E AC(15)INC rB5: AC AC + 1SPA rB4: if (AC(15) = 0) then (PC PC+1)SNA rB3: if (AC(15) = 1) then (PC PC+1)SZA rB2: if (AC = 0) then (PC PC+1)SZE rB1: if (E = 0) then (PC PC+1)HLT rB0: S 0 (S is a start-stop flip-flop)

15 12 11 0I/O operation1 1 1 1

108

To understand these operations lets discuss a simple computer with input and output devices connected to it.

Now we will discuss how control unit will identify that what type of instruction is getting executed.

Input-Output Configuration

Here are the details of the registers used in this organization:

INPR Input register - 8 bits : When we enter some value from keyboard(or from any other input device) ,the alphanumeric value of it gets stored to INPR and then processes to move to accumulator.

OUTR Output register - 8 bits : Similar to INPR , OUTR is the register which holds the alphanumeric code of the input it gets from accumulator before it gets printed on printer ( or displays to the monitor).

AC Accumulator – 16 bits: Accumulator id the main processor register who receives the first inputs and last outputs.

FGI Input flag - 1 bit: This is a control flip flop used to synchronize the timing difference between the input devices and the processor’s speed.

FGO Output flag - 1 bit: Similar to FGI, this is a control flip flop used to synchronize the timing difference between the output devices and the processor’s speed.

IEN Interrupt enable - 1 bit: This is a flip flop which tells us whether to interrupt the operations or not.

Important points:

Input-outputterminal

Serialcommunication

interface

Computerregisters andflip-flops

Printer

Keyboard

Receiverinterface

Transmitterinterface

FGOOUTR

AC

INPR FGI

Serial Communications PathParallel Communications Path

109

- The terminal sends and receives serial information- The serial info. from the keyboard is shifted into INPR - The serial info. for the printer is stored in the OUTR- INPR and OUTR communicate with the terminal

serially and with the AC in parallel.- The flags are needed to synchronize the timing difference between I/O device and the computer

The process continues as:Initially , the input flag FGI is cleared to 0.When a key is struck in the keyboard,

an 8 – bit alphanumeric code is shifted into INPR and the input flag FGI is set to 1.As long as the FGI is set to 1 , new information cannot be entered to it. The computer checks the flag bit , if it is 1 , the information from INPR is transferred in parallel to AC and FGI is set to 0 that means ready to take new key input.Similar is the operation in case of output devices , except in the flow of direction. Initially the FGO is set to 1. The computer checks the flag bit ;if it is 1 , the information from AC is transferred to OUTR and FGO is cleared to 0.The output device accepts the coded information , prints the corresponding character , and when the operation is completed , sets the flag to 1. In this case OUTR does not accept new character until the FGO is 0.

After understanding the operation in I/O organization lets discuss the various operations of inout output organization.

In these type of instructions the instructions itself tells us the operation and the register on which the operand has to be performed.

D7IT3 = pIR(i) = Bi, i = 6, …, 11 That means it supports only six operations for input /output and interrupt

type of instructions for bits B6 to B11 and B0 to B5 holds no importance.And the operations are:

INP: When FGI is 0 , give the value of INPR to accumulator.OUT: When output flag is set to 0 , sends accumulator value to OUTR.SKI : This shows that now the input device is busy and processor is free , so we

can execute the next instruction .

p: SC 0 Clear SCINP pB11: AC(0-7) INPR, FGI 0 Input char. to AC OUT pB10: OUTR AC(0-7), FGO 0 Output char. from AC SKI pB9: if(FGI = 1) then (PC PC + 1) Skip on input flag SKO pB8: if(FGO = 1) then (PC PC + 1) Skip on output flagION pB7: IEN 1 Interrupt enable onIOF pB6: IEN 0 Interrupt enable off

110

SKO: Similarly , if output device is busy in printing the output and the processor is now free we can fetch a new instruction for execution.

But to note that these instructions will be of branch type so that they return and check the flag again for the execution.

ION: This causes the interrupt to be ON , i.e the operations can be interrupted if IEN (interrpt enable) flag is set to 1.

IOF: This will cause the condition in which no interrupt is possible.

To explain more on the interrupts and know the value of IEN flag, lets discuss the interrupt cycle.

R is the interrupt flip flop which checks whether the instruction execution is due to normal fetch condition or because of interrupt. Thus w e checks whether R = 0 or not. If R is 0 , this is the case of normal instruction cycle. We fetch , decode and execute instruction and in parallel checking there is interrupt or not . System will only accept interrupts in case if IEN is 1. Thus , if IEN is 0, no chance of interrupt cycle. If IEN is 1 ten we check the flags FGI and FGO, if they are 0 that means processor is busy so no interrupt is possible. If they are 1 that means we can go for interrupt , thus setting the value of R as 1 and continue for interrupt cycle.

In case of interrupt, we have to store the address of the next instruction which comes in the normal execution to some place. We have stored it at address 0 memory and set the value of PC as 1 and also we set IEN and R as 0 to avoid further interrupt until this interrupt cycle is completed.

Store return address

R =1=0

in location 0M[0] PC

Branch to location 1PC 1

IEN 0 R 0

Interrupt cycleInstruction cycle

Fetch and decodeinstructions

IEN

FGI

FGO

Executeinstructions

R 1

=1

=1

=1

=0

=0

=0

111

Lecture – 25:

Computer Registers Instruction set completeness Timing and control circuit

Another type of organization uses some processor registers other than accumulator.Some important points to be noticed are :

• A processor has many registers to hold instructions, addresses, data, etc• The processor has a register, the Program Counter (PC) that holds the memory

address of the next instruction to get– Since the memory in the Basic Computer only has 4096 locations, the PC

only needs 12 bits• In a direct or indirect addressing, the processor needs to keep track of what

locations in memory it is addressing: The Address Register (AR) is used for this– The AR is a 12 bit register in the Basic Computer

• When an operand is found, using either direct or indirect addressing, it is placed in the Data Register (DR). The processor then uses this value as data for its operation

• The Basic Computer has a single general purpose register – the Accumulator (AC).

• The significance of a general purpose register is that it can be referred to in instructions

• e.g. load AC with the contents of a specific memory location; store the contents of AC into a specified memory location

• Often a processor will need a scratch register to store intermediate results or other temporary data; in the Basic Computer this is the Temporary Register (TR)

• The Basic Computer uses a very simple model of input/output (I/O) operations• Input devices are considered to send 8 bits of character data to the processor• The processor can send 8 bits of character data to output devices• The Input Register (INPR) holds an 8 bit character gotten from an input device• The Output Register (OUTR) holds an 8 bit character to be send to an output

deviceThe organization of these basic registers looks like:

112

The data registers are of 16 bit length and the address registers are of 12 bit.

Common Bus SystemCommon bus system deals with how these various registers are connected and they interact to each other.

• The registers in the Basic Computer are connected using a bus.• This gives a savings in circuitry over complete connections between registers.

That means if we use the general connection system i.e. connect each register with every other register will make it very complex and also a large no of connections will be required.

11 0PC

15 0IR

15 0TR

7 0

OUTR

15 0

DR

15 0AC

11 0AR

INPR0 7

Memory

4096 x 16

CPU

113

Bus

Memory unit4096 x 16

LD INR CLR

Address

ReadWrite

AR

LD INR CLR

PC

LD INR CLR

DR

LD INR CLR

ACALUE

INPR

IRLD

LD INR CLR

TR

OUTRLD

Clock

16-bit common bus

7

1

2

3

4

5

6

114

• In Basic Computer, there is only one general purpose register, the Accumulator (AC)

• In modern CPUs, there are many general purpose registers• It is advantageous to have many registers

– Transfer between registers within the processor are relatively fast– Going “off the processor” to access memory is much slower

Instruction set completenessA computer should have a set of instructions so that the user can construct machine language programs to evaluate any function that is known to be computable.Instruction TypesFunctional Instructions - Arithmetic, logic, and shift instructions - ADD, CMA, INC, CIR, CIL, AND, CLATransfer Instructions - Data transfers between the main memory

and the processor registers - LDA, STAControl Instructions - Program sequencing and control - BUN, BSA, ISZInput/Output Instructions - Input and output - INP, OUT

115

Flowchart for complete computer operations:

=1 (I/O) =0 (Register) =1(Indir) =0(Dir)

startSC 0, IEN 0, R 0

R

AR PCR’T0

IR M[AR], PC PC + 1R’T1

AR IR(0~11), I IR(15)D0...D7 Decode IR(12 ~ 14)

R’T2

AR 0, TR PCRT0

M[AR] TR, PC 0RT1

PC PC + 1, IEN 0R 0, SC 0

RT2

D7

I I

ExecuteI/O

Instruction

ExecuteRR

Instruction

AR <- M[AR] IdleD7IT3 D7I’T3 D7’IT3 D7’I’T3

Execute MRInstruction

=0(Instruction =1(Interrupt Cycle) Cycle)

=1(Register or I/O) =0(Memory Ref)

D7’T4

116

Lecture – 26:

Instruction Cycleo Flowchart for determining the type of instructiono Timing and control circuito Timing Signals

We have the instrcuction cycle as Fetch , decode and execute.At the time of decoding the instruction we find out the type of instruction. In this section we will discuss the flowchart and corresponding circuit to do so.

Flowchart for determining the type of instruction:

= 0 (direct)

StartSC

AR PCT0

IR M[AR], PC PC + 1T1

AR IR(0-11), I IR(15)Decode Opcode in IR(12-14),

T2

D7

= 0 (Memory-reference)(Register or I/O) = 1

II

ExecuteRegister-reference

InstructionSC 0

ExecuteInput-outputInstruction

SC 0

M[AR]AR Nothing

= 0 (register)(I/O) = 1 (indirect) = 1

T3 T3 T3 T3

ExecuteMemory-reference

InstructionSC 0

T4

117

Control unit of Basic Computer

In this circuit we explain how the instruction is fetched in IR . And the corresponding bits 12 , 13 and 14 are decoded to check the type of instruction. Too take this decision we take the help of combinational control logic by additional mode bit information. After checking the type of instruction corresponding control signals are generated.

To synchronize the fetch , decode and execute phases of instruction cycle we use a timing circuit . This contains a 4 bit sequence counter which gives us 16 timing signals by converting it with 4*16 decoder. Since the timing signals are fixed till 16 . We clear this

D'7IT3:AR[AR]D'7I'T3:NothingD7I'T3:Execute a register-reference instruction.D7IT3:Execute an input-output instruction.

Instruction register (IR)15 14 13 12 11 - 0

3 x 8decoder

7 6 5 4 3 2 1 0

ID0

15 14 . . . . 2 1 04 x 16

decoder

4-bitsequence

counter(SC)

Increment (INR)Clear (CLR)

Clock

Other inputs

Controlsignals

D

T

T

7

15

0

CombinationalControl

logic

118

clock for every instruction and increment it for the various phases . Again when new instruction is fetched , we clear the SC so that timing signals start back from T0.

To explain this further :We have taken an example of instruction STA which executes at D3T4.

- Generated by 4-bit sequence counter and 4´16 decoder- The SC can be incremented or cleared.

- Example: T0, T1, T2, T3, T4, T0, T1, . . . Assume: At time T4, SC is cleared to 0 if decoder output D3 is active.

Clock

T0 T1 T2 T3 T4 T0

T0

T1

T2

T3

T4

D3

CLR SC

D3T4: SC 0

119

Lecture – 27:

Control Memoryo Its Organization

Mapping Logic Microprogram Example

Control Memory:

The function of control unit in a digital computr is to initiate sequences of microoperations and these microoperations (the number) derives the complexity of digital system.

These control signals can be hardwaired(using conventional logic design techniques ) or microprogrammed .Generally , the control function is a binary variable can be either 1 state or 0 state depending on the application and these can be represented by a string of 1’s or 0’s called a control word..

A control unit whose binary control variables are stored in memory is called a microprogrammed control unit.

Each word in a control memory contains a micro instruction which is a set of microoperations. A sequence of microinstructions constitute a microprogram.

Since alteration are not required once the control unit is in operation , the control memory can be static memory or ROM(read only memory).

We can also use the technique of dynamic programming which can be used for writing (to change the program) but is used mostly for reading.This type of memory is also called writable control memory.

Thus we can say,A memory that is part of control unit is known as control memory.

A computer having microprogrammed control unit have 2 separate memories:Main Memory: This is used for storing programs which can be altered.

Control Memory: This holds a fixed microprogram that cannot be altered by occasional user and these specify various microinstructions that contains various internal control signals for execution of register operations.

These microinstructions generate the microoperations to: Fetch the instruction from memory. Evaluate the effective address Execute the operation specified by the instruction

120

Return control to the fetch phase in order to repeat the cycle for next instruction.

Configuration of a micro programmed control unit:

Fig: Micro programmed Control Unit

The control unit is assumed to be ROM , within which all control information is permanently stored. The control memory address register contains address of the microinstruction. Control data register holds microinstruction read from memory .The microinstruction contains a control word that specifies one or more microoperations for the data processor. After the execution of these microoperations we should get the location of the next operation that can also depend on the external input .To find the next address , we need next address generator also called a sequencer as it determines the address sequence that is read from control memory.

The typical function of a microprogram sequencer : Incrementing the CAR by 1 (in case of sequential execution). Loading the CAR an address from control memory(in case of branching). Transferring the external address(In case of interrupts) Loading an initial address to start the control operations(In case of first

microoperation).

The control data register holds the present microinstruction while the next address is computed and read from memory.It is also called pipeline register as it allows the execution of microoperations simultaneously with the generation of the next microinstruction.It requires a 2 phase clock , one applied to address register and one for data register.

We can also work without control data register using single phase clock in which the control word and next address information are taken directly from control memory.Rom operates as a combinational circuit , with the address value as the input and the corresponding word as output.The content of the specified word remains in the address register .

Next Address Generator

Control Address Register (CAR)

ControlMemory-ROM

Control DataRegister

Control Word

121

External Input

The main advantage of the microprogrammed control is the fact that once the hardware configuration is established , there should be no need for further hardwire or wiring changes.Only thing changes is microprogram residing in control memory.

Mapping Of instructions:

Mapping from the OP-code of an instruction to the address of the Microinstruction which is the starting microinstruction of its execution microprogram

Here , we have to generate the address of the microinstruction with the help of instruction.In this we fetch the instruction and gets the opcode of the particular instruction. For mapping we copy the values of the opcode directly for microinstruction address but we append some bits at the end and starting of the microinstruction. What values will be appended is completely a decision of the designer .like in this example we have appended 0 before opcode copied bits and 00 at the end .And this mapping rule will be generalized for all opcodes /instructions. Note : The number of bits appended at the end specifies the maximum length of the microprogram.

In the next diagram we have shown the mapping of various instructions to its particular microinstructions or microprogram.

1 0 1 1 Address

OP-code

Mapping bits

Microinstruction address

0 x x x x 0 0

0 1 0 1 1 0 0

MachineInstruction

122

MICROPROGRAM EXAMPLE

Computer Harware Configuration

This type of configuration contains two memory units : Main memory for storing instructions and data Control memory for storing microinstructions/microprogram.

In this main memory is accessed with the help of PC , AR and DR and the transfer of information takes place with the help of MUX instead of common bus.Similarly control memory is accessed with the help of CAR and the manipulation of address sequencing is helped through SBR. The control signals then fetched from control memory are manipulated with the help of ALU taking the values from DR and AC and storing the result to AC.

MUX

AR10 0

PC10 0

Address Memory2048 x 16

MUX

DR15 0

Arithmeticlogic andshift unit

AC15 0

SBR6 0

CAR6 0

Control memory128 x 20

Control unit

123

Mapping function implemented by ROM or PLA

The mapping function is sometimes implemented by means of an integrated circuit called programmable logic device or PLD.This is similar to ROM and the mapping function is expressed in terms of Boolean expressions which are implemented with PLD.

ADD RoutineAND RoutineLDA RoutineSTA RoutineBUN Routine

ControlStorage

00000001001000110100

OP-codes of Instructions ADD AND LDA STA BUN

00000001001000110100

.

.

.

Direct Mapping

Address

10 0000 010

10 0001 010

10 0010 010

10 0011 010

10 0100 010

MappingBits 10 xxxx 010

ADD Routine

Address

AND Routine

LDA Routine

STA Routine

BUN Routine

OP-code

Mapping memory(ROM or PLD)

Control address register

Control Memory

124

Lecture – 30:

Direct Memory Access

Block of data transfer from high speed devices, Drum, Disk, Tape* DMA controller - Interface which allows I/O transfer directly between Memory and Device, freeing CPU for other tasks* CPU initializes DMA Controller by sending memory address and the block size(number of words)

Block Diagram of DMA controller

Starting an I/O - CPU executes instruction to Load Memory Address Register Load Word Counter Load Function(Read or Write) to be performed Issue a GO command

Upon receiving a GO Command DMA performs I/O

High-impedence(disabled)

when BG isenabled

CPU bus signals for DMA transfer

Address bus

Data bus

Read

Write

ABUSDBUS

RDWR

Bus request

Bus granted

BR

BGCPU

Address bus

Data bus

DMA select

Register select

Read

Write

Bus request

Bus grant

Interrupt

DS

RS

RD

WR

BR

BG

Interrupt

Data busbuffers

Address busbuffers

Address register

Word count register

Control register

DMA request

DMA acknowledge to I/O device

Controllogic

Inte

rnal

Bu

s

125

operation as follows independently from CPU

Input [1] Input Device <- R (Read control signal) [2] Buffer(DMA Controller) <- Input Byte; and assembles the byte into a word until word is full [4] M <- memory address, W(Write control signal) [5] Address Reg <- Address Reg +1; WC(Word Counter) <- WC - 1 [6] If WC = 0, then Interrupt to acknowledge done, else go to [1]

Output [1] M <- M Address, R M Address R <- M Address R + 1, WC <- WC - 1 [2] Disassemble the word [3] Buffer <- One byte; Output Device <- W, for all disassembled bytes [4] If WC = 0, then Interrupt to acknowledge done, else go to [1]

While DMA I/O takes place, CPU is also executing instructions

DMA Controller and CPU both access Memory -> Memory Access Conflict

Memory Bus Controller

- Coordinating the activities of all devices requesting memory access - Priority System

Memory accesses by CPU and DMA Controller are interwoven, with the top priority given to DMA Controller

-> Cycle Stealing

Cycle Steal

- CPU is usually much faster than I/O(DMA), thus CPU uses the most of the memory cycles - DMA Controller steals the memory cycles from CPU - For those stolen cycles, CPU remains idle - For those slow CPU, DMA Controller may steal most of the memory

cycles which may cause CPU remain idle long time

126

DMA TRANSFER

BG

BRCPU

RD WR Addr Data

Interrupt

Random-accessmemory unit (RAM)

RD WR Addr Data

BR

BG

RD WR Addr Data

Interrupt

DS

RS DMAController

I/OPeripheral

deviceDMA request

DMA ack.

Read control

Write control

Data bus

Address bus

Addressselect

127

Lecture – 30:

Interruptso Types of interrupts

Interrupt cycle

Types of Interrupts:

External interrupts External Interrupts initiated from the outside of CPU and Memory - I/O Device → Data transfer request or Data transfer complete - Timing Device → Timeout - Power Failure - Operator

Internal interrupts (traps) Internal Interrupts are caused by the currently running program - Register, Stack Overflow - Divide by zero - OP-code Violation - Protection Violation

Software Interrupts Both External and Internal Interrupts are initiated by the computer HW. Software Interrupts are initiated by the executing an instruction. - Supervisor Call → Switching from a user mode to the supervisor mode → Allows to execute a certain class of operations

which are not allowed in the user mode

Interrupt Procedure:- The interrupt is usually initiated by an internal or an external signal rather than from the execution of an instruction (except for the software interrupt)

- The address of the interrupt service program is determined by the hardware rather than from the address field of an instruction

- An interrupt procedure usually stores all the information necessary to define the state of CPU rather than storing only the PC.

The state of the CPU is determined from; Content of the PC

128

Content of all processor registers Content of status bits Many ways of saving the CPU state depending on the CPU architectures

Flowchart of interrupts:To explain more on the interrupts and know the value of IEN flag, lets discuss the

interrupt cycle.

R is the interrupt flip flop which checks whether the instruction execution is due to normal fetch condition or because of interrupt. Thus w e checks whether R = 0 or not. If R is 0 , this is the case of normal instruction cycle. We fetch , decode and execute instruction and in parallel checking there is interrupt or not . System will only accept interrupts in case if IEN is 1. Thus , if IEN is 0, no chance of interrupt cycle. If IEN is 1 ten we check the flags FGI and FGO, if they are 0 that means processor is busy so no interrupt is possible. If they are 1 that means we can go for interrupt , thus setting the value of R as 1 and continue for interrupt cycle.

In case of interrupt, we have to store the address of the next instruction which comes in the normal execution to some place. We have stored it at address 0 memory and set the value of PC as 1 and also we set IEN and R as 0 to avoid further interrupt until this interrupt cycle is completed.

Store return address

R =1=0

in location 0M[0] PC

Branch to location 1PC 1

IEN 0 R 0

Interrupt cycleInstruction cycle

Fetch and decodeinstructions

IEN

FGI

FGO

Executeinstructions

R 1

=1

=1

=1

=0

=0

=0

129

Date post:	20-Nov-2014
Category:	Documents
Upload:	roshmi
View:	130 times
Download:	5 times

cao notess

Documents