Computer Architecture Chang-Bum Lee Dept. of Computer Engineering Youngsan University Computer...

Computer Architecture

Chang-Bum Lee

Dept. of Computer EngineeringYoungsan University

Computer Architecture 1

Course Content(1)

2Computer Architecture

Lecture #1 — Course Overview

Course ContentsCourse Schedule\Grading GuidelinesTest and Assignments

Lecture #2 — Basic Architecture of Computer

Basic Architecture System Configuration

Lecture # 3 — Instruction Execution

Fetch CycleExecution CycleInterrupt Cycle

Course Content(2)

Lecture #4, 5— Instruction Set

Program ControlInstruction Formats Addressing ModesPentium Processors

Lecture #6, 7— Arithmetic and Logical Operations

Arithmetic and Logical UnitInteger RepresentationLogic Operations Shift OperationsArithmetic Operations of Integer

(Addition, Subtraction, Multiplication, and Division)


Course Content(3)Lecture #8, 9

— Real NumbersRepresentation of Floating Point NumbersArithmetic Operations of Floating Point Numbers

(Addition, Subtraction, Multiplication, and Division)

Lecture #10— Control Unit

Structure of Control UnitMicroinstructionMicroprogram

Lecture #11— Memory Devices

Memory HierarchyRAMROMDesign of Memory Device Modules


Course Content(4)Lecture #12

— Cache MemoryCache SizeFetch MethodMapping


Computer Architecture: Course Overview

Lecture #1

Computer Architecture 6

Course Objectives

7

Understand role & relationship of hardware and software

Exposure to. . .— Machine organization— Assembly language programming— C programming

Able to actually build entire (slow) computing system—Hardware and software

Be distinguished from mere programmers


Course Schedule The complete course, including Lectures and Seminars, will be covered in 90 hours(15 weeks).

The total duration of the course will be 4 months.

Lecture 3 hours (2 hours + 1 hour) weekly


Grading Guidelines

Attendance : 20%— Depending on students class participation

Final Exam : 40%— Textbook based in class final exam

Midterm Exam : 30%— Textbook based in class mid-term exam

Assignments : 10%— Based on submitting assignments


Course References

Computer Architecture Computer Architecture/Jong-Hyun Kim

By Sang Lung Publishing Corp.

The course slides will be available at

http://prof.ysu.ac.kr/blog/postlist.asp?b_id=cblee


Course Summary

11

Introduction to computer architecture— How is data represented?— What are the pieces of a computer?— How do computers work?

Programming— How do I "talk" directly to the machine?— How do I program in C?

Computer Systems and Computation— How do simple HW/SW elements come together to realize

complex computations?


Computer Architecture: Basic Architecture

Lecture #2


Introduction - Architecture (1)Architecture is those attributes visible to the programmer—Instruction set, number of bits used for data

representation, I/O mechanisms, addressing techniques.

—e.g. Is there a multiply instruction?

Organization is how features are implemented—Control signals, interfaces, memory

technology.—e.g. Is there a hardware multiply unit or is it

done by repeated addition?


Introduction - Architecture (2)All Intel x86 family share the same basic architecture.

The IBM System/370 family share the same basic architecture.

This gives code compatibility.—At least backwards

Organization differs between different versions.


Structure & FunctionStructure is the way in which components relate to each other.

Function is the operation of individual components as part of the structure.

All computer functions are:—Data processing—Data storage—Data movement—Control


ENIACElectronic Numerical Integrator And ComputerEckert and Mauchly in University of PennsylvaniaTrajectory tables for weapons Started 1943, Finished 1946— Too late for war effort

Used until 1955

— Decimal (not binary)— 20 accumulators of 10 digits— Programmed manually by switches— 18,000 vacuum tubes, 30 tons— 15,000 square feet— 140 kW power consumption— 5,000 additions per second


Structure of von Neumann Machine

Stored Program conceptMain memory storing programs and dataALU operating on binary dataControl unit interpreting instructions from memory and executingInput and output equipment operated by control unitPrinceton Institute for Advanced Studies —IAS

Completed 1952


Transistor Based ComputersTransistors — Replaced vacuum tubes— Smaller— Cheaper— Less heat dissipation— Solid State device— Made from Silicon (Sand)— Invented 1947 at Bell Labs— William Shockley et al.

Transistor based computers— Second generation machines— NCR & RCA produced small transistor machines— IBM 7000, DEC - 1957— Produced PDP-1


Speeding It Up & Performance Mismatch

Speeding it up — Pipelining— On board cache(L1 & L2 cache)— Branch prediction— Data flow analysis— Speculative execution

Performance Mismatch— Processor speed increased— Memory capacity increased— Memory speed lags behind processor speed


SolutionsIncrease number of bits retrieved at one time.— Make DRAM “wider” rather than “deeper”

Change DRAM interface.— Cache

Reduce frequency of memory access.— More complex cache and cache on chip

Increase interconnection bandwidth.— High speed buses— Hierarchy of buses


Program ConceptHardwired systems are inflexible.

General purpose hardware can do different tasks, given correct control signals.

Instead of re-wiring, supply a new set of control signals.

A sequence of steps— For each step, an arithmetic or logical operation is done.— For each operation, a different set of control signals is needed.


Computer ComponentsThe Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit.

Data and instructions need to get into the system and results out.—Input/output

Temporary storage of code and results is needed.—Main memory


Computer Architecture: CPU Structures and Functions

Lecture #3


CPU StructureCPU must: Fetch instructions Interpret instructions Fetch data, process data, and write data

Registers CPU must have some working space

(temporary storage) Number and function vary between processor designs One of the major design decisions Top level of memory hierarchy

Control Unit Control unit coordinates sequence of execution steps

ALU ALU performs arithmetic and logical processing

Registers

ALU

ControlUnit

CPU Internal Bus

AddressBus

DataBus

ControlBus

System Bus


CPU Structure

Instruction Set

Software

Hardware


Fetch Cycle(1)Program Counter (PC) holds address of next instruction to fetch.

Processor fetches instruction from memory location pointed to by PC.

Increment PC— Unless told otherwise

Instruction loaded into Instruction Register (IR)– to: MAR <- PC

– t1: MBR <-M[MAR], PC <- PC+1

– t2: IR <-MBR

Processor interprets instruction and performs required actions


Fetch Cycle(2)

Micro operation– to: MAR <- PC

– t1: MBR <-M[MAR], PC <- PC+1

– t2: IR <-MBR

Address and Instruction Flow in fetch cycle

MemoryDevices

AddressBus

ControlBus

DataBus

ControlUnit


Execute Cycle(1)Processor-memory— data transfer between CPU and main memory

Processor I/O— Data transfer between CPU and I/O module

Data processing— Some arithmetic or logical operation on data

Control— Alteration of sequence of operations— e.g. jump

Combination of above


Execute Cycle(2)Example—LOAD addr :

– to: MAR <- IR(addr)

– t1: MBR <-M[MAR]

– t2: AC <-MBR

—STA addr—ADD addr

ControlUnit

AddressBus

ControlBus

DataBus

MemoryDevices


Interrupt Cycle

Added to instruction cycleProcessor checks for interrupt—Indicated by an interrupt signal

If no interrupt, fetch next instructionIf interrupt pending:—Suspend execution of current program —Save context—Set PC to start address of interrupt handler routine—Process interrupt—Restore context and continue interrupted program


Multiple Interrupts(1)

Disable interrupts—Processor will ignore further interrupts while

processing one interrupt—Interrupts remain pending and are checked

after first interrupt has been processed—Interrupts handled in sequence as they occur


Multiple Interrupts(2)

Main Program

• Define priorities- Low priority interrupts can be interrupted by higher priority interrupts- When higher priority interrupt has been processed, processor returns to previous interrupt


Indirect Cycle

May require memory access to fetch operandsIndirect addressing requires more memory accessesCan be thought of as additional instruction subcycle


Prefetch

Fetch accessing main memoryExecution usually does not access main memoryCan fetch next instruction during execution of current instructionCalled instruction prefetch


Improved Performance

But not doubled:—Fetch usually shorter than execution

– Prefetch more than one instruction?

—Any jump or branch means that prefetched instructions are not the required instructions

Add more stages to improve performance


Pipelining

Fetch instructionDecode instructionCalculate operands (i.e. EAs)Fetch operandsExecute instructionsWrite result

Overlap these operations


Two Stage Instruction Pipeline

Fetch Execute

Instruction ResultInstruction

(a) Simplified View

Fetch Execute

Instruction ResultInstruction

(b) Expanded View

Discard

New AddressWait Wait


Memory Connection

Receives and sends dataReceives addresses (of locations)Receives control signals —Read—Write—Timing


Input/Output ConnectionSimilar to memory from computer’s viewpointOutput—Receive data from computer—Send data to peripheral

Input—Receive data from peripheral—Send data to computer

Receive control signals from computerSend control signals to peripherals—e.g. spin disk

Receive addresses from computer—e.g. port number to identify peripheral

Send interrupt signals (control)39Computer Architecture

CPU ConnectionReads instruction and dataWrites out data (after processing)Sends control signals to other unitsReceives (& acts on) interrupts

Buses—There are a number of possible interconnection systems—Single and multiple BUS structures are most common—e.g. Control/Address/Data bus (PC)—e.g. Unibus (DEC-PDP)


What is a Bus?A communication pathway connecting two or more devices

Usually broadcast

Often grouped—A number of channels in one bus—e.g. 32 bit data bus is 32 separate single bit channels.

Power lines may not be shown


Data Bus and Address BusData Bus—Carries data

– Remember that there is no difference between “data” and “instruction” at this level

—Width is a key determinant of performance– 8, 16, 32, 64 bit

Address Bus—Identify the source or destination of data—e.g. CPU needs to read an instruction (data) from a

given location in memory—Bus width determines maximum memory capacity of

system– e.g. 8080 has 16 bit address bus giving 64k address space


Control Bus

Control and timing information—Memory read/write signal—Interrupt request—Clock signals


Single Bus ProblemsLots of devices on one bus leads to:—Propagation delays

– Long data paths mean that co-ordination of bus use can adversely affect performance.

– If aggregate data transfer approaches bus capacity.

Most systems use multiple buses to overcome these problems.


Bus Types and ArbitrationBus Types—Dedicated

– Separate data & address lines—Multiplexed

– Shared lines– Address valid or data valid control line– Advantage - fewer lines– Disadvantages

+ More complex control+ Ultimate performance

Bus Arbitration— More than one module controlling the bus— e.g. CPU and DMA controller— Only one module may control bus at one time— Arbitration may be centralised or distributed


TimingCo-ordination of events on bus

Synchronous—Events determined by clock signals—Control Bus includes clock line—A single 1-0 is a bus cycle—All devices can read clock line—Usually sync on leading edge—Usually a single cycle for an event

Asynchronous—Read, Write


Memory Hierarchy & Physical TypesMemory Hierarchy—Registers

– Exist In CPU—Internal or Main memory

– May include one or more levels of cache– Mainly “RAM”

—External memory– Backing store

Physical Types—Semiconductor types are mainly RAM—Magnetic types are Disk & Tape—Optical types are CD & DVD—Others are Bubble, Hologram, etc.


Performance

Access time—Time between presenting the address and

getting the valid data

Memory Cycle time—Time may be required for the memory to

“recover” before next access.—Cycle time is access + recovery.

Transfer Rate—Rate at which data can be moved.


Instruction RepresentationIn machine code each instruction has a unique bit pattern.

For human consumption (well, programmers anyway) a symbolic representation is used.—e.g. ADD, SUB, LOAD

Operands can also be represented in this way.—ADD A,B


Computer Architecture: Instruction Types and Addressing Modes

Lecture #4, #5


Instruction Format and Types

4 bits

Opcode Operand Reference Operand Reference

6 bits 6 bits

16 bits

Simple Instruction Format

Instruction TypesData processingData storage (main memory)Data movement (I/O)Program flow control


Number of Addresses (1)

3 addresses—Operand 1, Operand 2, Result—a = b + c;—May be a forth - next instruction (usually

implicit)—Not common—Needs very long words to hold everything



2 addresses—One address doubles as operand and result.—a = a + b—Reduces length of instruction—Requires some extra work

– Temporary storage to hold some results

1 address—Implicit second address—Usually a register (accumulator)—Common on early machines



0 (zero) addresses—All addresses implicit—Uses a stack—e.g.

– push a– push b– add– pop c

—c = a + b


Design Decisions (1)Operation repertoire—How many ops?—What can they do?—How complex are they?

Data types

Instruction formats—Length of op code field—Number of addresses


Addressing Modes

ImmediateDirectIndirectRegisterRegister IndirectDisplacement (Indexed) Stack


Immediate Addressing

Operand is part of instructionOperand = address fielde.g. ADD 5—Add 5 to contents of accumulator—5 is operand

No memory reference to fetch dataFastLimited range


Immediate Addressing Diagram

OperandOpcode

Instruction


Direct Addressing

Address field contains address of operand.Effective address (EA) = address field (A)e.g. ADD A—Add contents of address A to accumulator

Single memory reference to access dataNo additional calculations to work out effective addressLimited address space


Direct Addressing Diagram

MemoryAddress AOpcode

Instruction

Operand


Indirect AddressingMemory cell pointed to by address field contains the address of (pointer to) the operand.EA = (A)—Look in A, find address (A) and look there for operand.

e.g. ADD (A)—Add contents of cell pointed to by contents of A to

accumulator.

Large address space 2n where n = word lengthMay be nested, multilevel, cascaded—e.g. EA = (((A))) Draw the diagram yourself

Multiple memory accesses to find operandHence slower


Indirect Addressing Diagram

Address AOpcode

Instruction

Memory

Operand

Pointer to operand


Register Addressing (1)Operand is held in register named in address filed.

EA = R

Limited number of registers

Very small address field needed —Shorter instructions—Faster instruction fetch


Register Addressing (2)

No memory accessVery fast executionVery limited address spaceMultiple registers helps performance—Requires good assembly programming or

compiler writing—N.B. C programming

– register int a;

c.f. Direct addressing


Register Addressing Diagram

Register Address ROpcode

Instruction

Registers

Operand


Register Indirect Addressing

C.f. indirect addressingEA = (R)Operand is in memory cell pointed to by contents of register RLarge address space (2n)One fewer memory access than indirect addressing


Register Indirect Addressing Diagram

Register Address ROpcode

Instruction

Memory

OperandPointer to Operand

Registers


Displacement Addressing

EA = A + (R)Address field hold two values—A = base value—R = register that holds displacement—or vice versa


Displacement Addressing Diagram

Register ROpcode

Instruction

Memory

OperandPointer to Operand

Registers

Address A

+


Relative Addressing

A version of displacement addressingR = Program counter, PCEA = A + (PC)i.e. get operand from A cells from current location pointed to by PCc.f locality of reference & cache usage


Base-Register Addressing

A holds displacementR holds pointer to base addressR may be explicit or implicite.g. segment registers in 80x86


Indexed Addressing

A = baseR = displacementEA = A + RGood for accessing arrays—EA = A + R—R++


Combinations

PostindexEA = (A) + (R)

PreindexEA = (A+(R))

(Draw the diagrams)


Stack Addressing

Operand is (implicitly) on top of stacke.g. —ADD Pop top two items from stack

and add


Pentium Addressing ModesVirtual or effective address is offset into segment.— Starting address plus offset gives linear address.— This goes through page translation if paging enabled.

12 addressing modes available— Immediate— Register operand— Displacement— Base— Base with displacement— Scaled index with displacement— Base with index and displacement— Base scaled index with displacement— Relative


Instruction Types

Instruction generally four types.—Data processing—Data storage (main memory)—Data movement (I/O)—Program flow control


Design Decisions (1)

Operation repertoire—How many ops?—What can they do?—How complex are they?

Data types

Instruction formats—Length of op code field—Number of addresses


Design Decisions (2)

Registers—Number of CPU registers available—Which operations can be performed on which

registers?

Addressing modes (later…)

RISC v CISC


Types of Operation

There are several types of operations as follows.—Data Transfer—Arithmetic—Logical—Conversion—I/O—System Control—Transfer of Control


ArithmeticArithmetic operations include Add, Subtract, Multiply, Divide.

Can use signed integer.

Can arithmetic operations process floating point ?

May include.—Increment (a++)—Decrement (a--)—Negate (-a)


Shift and Rotate Operations

Logical right shiftLogical left shiftArithmetic right shiftArithmetic left shiftRight rotateLeft rotate


Logical and Conversion

Logical—Has bitwise operations.—Logical operations are AND, OR, NOT, etc.

Conversion—E.g. Binary to Decimal


Input/Output

May be specific instructions.May be done using data movement instructions. (memory mapped)May be done by a separate controller (DMA).


Transfer of Control

Branch—e.g. branch to x if result is zero

Skip—e.g. increment and skip if zero—ISZ Register1: Skip if zero—Branch xxxx

Subroutine call—c.f. interrupt call: jump to interrupt service

routine


Branch Instruction

Unconditional Branch—Jump to 211 unconditionally.

Conditional Branch 1—Jump to 211 if accumulator is zero.

Conditional Branch 2—Jump to 235 if R1 equals to R2.


Nested Procedure Calls

If a main program calls procedure 1, it goes to Proc.1 and it’s procedure is processed.

If the Proc.1 calls another procedure(Proc.2), it goes to Proc.2 and it’s procedure is processed.

If Proc.2 meets RETURN instruction, it returns to Proc.1.


Computer Architecture: Arithmetic and Logical Operations of Computer

Lecture #6, #7


Arithmetic & Logic UnitDoes the calculations.

Everything else in the computer is there to service this unit.

Handles integers.

May handle floating point (real) numbers.

May be separate FPU (maths co-processor).


Integer Representation

Only have 0 & 1 to represent everythingPositive numbers stored in binary—e.g. 41=00101001

Has no minus sign Has no periodHas sign-magnitudeUse one’s or two’s compliment


Sign-Magnitude

Left most bit is sign bit.0 means positive.1 means negative.+18 = 00010010 -18 = 10010010Problems—Need to consider both sign and magnitude in

arithmetic—Two representations of zero (+0 and -0)


Two’s Compliment+3 = 00000011, +2 = 00000010+1 = 00000001, +0 = 00000000 -1 = 11111111, -2 = 11111110 -3 = 11111101

Benefits—Two’s compliment has one representation of

zero.—Arithmetic works easily (see later).—Negating is fairly easy.

– 3 = 00000011– Boolean complement gives 11111100– Add 1 to LSB 11111101


Logical Operations

AND, OR, XOR, NOTSelective-set, Selective-complementMasking, Insert, Compare

Bitwise operations—Logical Shift—Circular Shift— Arithmetic Shift— Shift with Carry


Shift and Rotate Operations


Addition and SubtractionNormal binary addition

Monitor sign bit for overflow

Take two’s compliment of substahend and add to minuend.—i.e. a - b = a + (-b)

So we only need addition and complement circuits.


Hardware for Addition and Subtraction

OF: overflow bitSW: Switch (select addition or subtraction)

B Register

Complementer

SW

Adder

A Register

OF


Multiplication

Is complexWork out partial product for each digitTake care with place value (column)Add partial products


Multiplication Example

1011 Multiplicand (11 dec)

x 1101 Multiplier (13 dec) 1011 Partial products 0000 1011 1011 10001111 Product (143 dec)

Note: if multiplier bit is 1, copy multiplicand (place value),

otherwise zeroNote: need double length result


Booth’s AlgorithmSTART

A←0, Q-1 ← 0M ← MultiplicandQ ← MultiplierCounter ← n

Q0, Q-1

Arithmetic Shift Right of A, Q, Q-1

Counter ← Counter-1

Counter=0? END

A← A + MA← A - M

= 01= 10

YesNo

= 11= 00


DivisionMore complex than multiplicationNegative numbers are really bad!Based on long divisionDivision of Unsigned Binary Integers

001111

1011

00001101

100100111011001110

1011

1011100

Quotient

Dividend

Remainder

PartialRemainders

Divisor


Computer Architecture: Real Numbers

Lecture #8, #9


Real NumbersNumbers with fractions

Could be done in pure binary—1001.1010 = 24 + 20 +2-1 + 2-3 =9.625

Where is the binary point?

Fixed?—Very limited

Moving?—How do you show where it is?


Floating Point

+/- .significand x 2exponent

Point is actually fixed between sign bit and body of mantissa.Exponent indicates place value (point position).

Sig

n bi

t

BiasedExponent

Mantissa


Floating Point Examples

S E field Mantissa field

(b) Examples of a data representation

Sign(S) bit = 0 Exponent(E) field = 00000101 Mantissa(M) field = 1101 0000 0000 0000 0000 0000

1 bit 8 bits 23 bits

(a) 32-bit floating point format

0 000000101 11010000 00000000 0000000


Signs for Floating Point

Mantissa is stored in 2s complement.Exponent is in excess or biased notation.—e.g. Excess (bias) 128 means—8 bit exponent field—Pure value range 0-255—Subtract 128 to get correct value—Range -128 to +127


Normalization

FP numbers are usually normalized.i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1.Since it is always 1 there is no need to store it.c.f. Scientific notation where numbers are normalized to give a single digit before the decimal point. e.g. 3.123 x 103


FP Ranges

For a 32 bit number—8 bit exponent —+/- 2256 1.5 x 1077

Accuracy—The effect of changing lsb of mantissa—23 bit mantissa 2-23 1.2 x 10-7


Expressible Numbers


IEEE 754Standard for floating point storage32 and 64 bit standards8 and 11 bit exponent respectively


Floating Point ArithmeticFP Arithmetic +/-—Check for zeros—Align significands (adjusting exponents)—Add or subtract significands—Normalize result

FP Arithmetic x/—Check for zero—Add/subtract exponents —Multiply/divide significands (watch sign)—Normalize—Round—All intermediate results should be in double length storage


Floating Point Multiplication


Computer Architecture: Control Unit

Lecture #10


112

Functions of control unit— Decoding of an instruction code— Generation of control signals for instruction execution

Micro-instruction : — Control word

Micro-program : — Set of micro-instructions

Routine— Groups of micro-instructions for special functions

of CPU ex. Fetch cycle routine, Execution cycle routine,

Interrupt cycle routine

Control Unit


113

Structure of Control Unit

Configuration elements— Instruction decoder

— Control address register: CAR

— Control memory) : Internal Memory to store the micro programs

— control buffer register: CBR

— subroutine register: SBR

— sequencing module


114

Internal Structure of Control Unit

Instruction Register

Instruction Decoder

CAR

Control Memory Device

CBR

Decoder

SBRSequencing

Module

External Control Signals

Internal Control Signals

Condition Flags


115

Internal Structure of the Control Memory Device

Example— Capacity of CMD = 512

words— The first half (Address 0 ~

63) : Store common routines

— The second half (Address 64 ~ 127) : Store execution routines of each instruction

Fetch Cycle Routine

Indirect Cycle Routine

Interrupt Cycle Routine

Execution Cycle Routine 1

Execution Cycle Routine 2

0...

6364

127


116

Mapping

Instruction Code

Mapping Function


117

Binary Codes and Symbols for Micro Operations(Examples)

Code Micro-operation Symbol 000 None NOP 001 MAR PC+1 PCTAR 010 MAR IR(addr) IRTAR 011 AC AC+MBR ADD 100 MBR M[MAR] READ 101 AC MBR BRTAC 110 IR MBR BRTIR 111 M[MAR] MBR WRITE

• Op field 1


118

Op field 2

Code Micro-operation Symbol 000 None NOP 001 PC PC+1 INCPC 010 MBR AC ACTBR 011 MBR PC PCTBR 100 PC MBR BRTPC 101 MAR SP SPTAR 110 AC AC-MBR SUB 111 PC IR(addr) IRTPC

Binary Codes and Symbols for Micro Operations(Examples)


119

Micro-programming

Fetch Cycle Routine ORG O

FETCH: PCTAR U JMP NEXT ; MAR <-PC Execution of next instruction

READ, INCPC U JMP NEXT ; BR <-M[MAR], PC =PC+1

Execution of next instruction

BRTIR U MAP; IR<-MBR Branch to the execution cycle

Binary Bit Pattern


120

Indirect Cycle RoutineMicro instruction routine of the indirect cycle

Binary Bit Pattern



Return to the execution cycle


121

Execution Cycle Routine

Instruction Op code Staring address of the routine


122

Execution Cycle Routines for each instruction

; Call the indirect cycle routine if I=1

; Call the indirect cycle routine if I=1


Computer Architecture: Memory Devices

Lecture #11


Memory Classification• Main memory :

—Internal memory

• Auxiliary storage device— External memory


Memory Hierarchy

Registers—In CPU

Internal or Main memory—May include one or more

levels of cache—“RAM”

External memory—Backing store


Semiconductor Memory Types


Semiconductor Memory

RAM —Misnamed as all semiconductor memory is

random access—Read/Write—Volatile—Temporary storage—Static or dynamic


Memory Cell Operation

(a) Write (b) Read

CellSelect

Control

Data InCellSelect

Control

Sense


Dynamic RAMBits stored as charge in capacitorsCharges leakNeed refreshing even when poweredSimpler constructionSmaller per bitLess expensiveNeed refresh circuitsSlowerMain memoryEssentially analogue—Level of charge determines value


Refreshing

Refresh circuit included on chipDisable chipCount through rowsRead & Write backTakes timeSlows down apparent performance


Dynamic RAM Structure

Address Line

Bit Line B

Transistor

Storage Capacitor

Ground


DRAM OperationAddress line active when bit read or written—Transistor switch closed (current flows)

Write—Voltage to bit line

– High for 1 low for 0—Then signal address line

– Transfers charge to capacitor

Read—Address line selected

– transistor turns on—Charge from capacitor fed via bit line to sense amplifier

– Compares with reference value to determine 0 or 1—Capacitor charge must be restored


Typical 16 Mb DRAM (4M x 4)


Static RAMBits stored as on/off switchesNo charges to leakNo refreshing needed when poweredMore complex constructionLarger per bitMore expensiveDoes not need refresh circuitsFasterCacheDigital—Uses flip-flops


Static RAM Structure

dc voltage

Ground

Address Line Bit Line BBit Line B

T3 T4

T5 C1C2

T1 T2

T6


SRAM and DRAMBoth volatile—Power needed to preserve data

DRAM —Simpler to build, smaller—More dense—Less expensive—Needs refresh—Larger memory units

SRAM—Faster—Used in cache


Read Only Memory (ROM)

Permanent storage—Nonvolatile

Microprogramming (see later)

Library subroutines

Systems programs (BIOS)

Function tables


Types of ROMWritten during manufacture—Very expensive for small runs

Programmable (once)—PROM—Needs special equipment to program

Read “mostly”—Erasable Programmable (EPROM)

– Erased by UV

—Electrically Erasable (EEPROM)– Takes much longer to write than read

—Flash memory– Erase whole memory electrically


Packaging


140

Design of Memory Device Module

[Example] Design of 1Kx32 bit memory device module using 1K×8 bit RAM chips

– Method : parallel connection of 4 RAM chips

– Capacity of module: (1K×8) × 4 = 1K×32 bits = 1K words

– Address bits(10 bits: A9∼A0) : Common connection to all chips

– Address area: 000H ∼ 3FFH (H: Hexadecimal)

– Data Store: 8 bits/chip


141

Design of 1K×32 bits Memory Device Module

Address(A9-0)

Data Bus(32 bits)


142

Design of Memory Device Module(con’t)

[Example] Design of 4Kx8 bit memory device module using 1K×8 bit RAM chips

– Method : serial connection of 4 RAM chips– Capacity of module: (1K×8) × 4 = 4K×8 bits

= 4K bytes

– Address bits(12 bits: A11∼A0) :+ upper 2 bits : generation of 4 chip select

signals using address decoder + lower 10 bits : common connection to all chips

– Address area: 000H ∼ FFFH (H: Hexadecimal)

– Data Store: 8 bits/address


143

Design of 4K×8 bits Memory Device Module

2×4 Decoder

Data(D7-0)


144

Address Areas of each RAMRAM Address area

Chip No

from

to

from

to

from

to

from

to

Address Area


145

Design Procedure of Memory Module

Design Procedure 1. Decision of memory capacity for computer

system2. Chip decision and design of address map3. Circuit design in detail


146

• Capacity : 1K bytes RAM, 512 bytes ROM • Address: RAM = 0 ~, ROM = 800H ~• Useful chips: 256×8 bits RAM, 512×8 bits ROM

o Address table

Memory Design for 8-bit Micro Computer

Memory ChipAddress Area(Hexadecimal)

Address bits


147

Design Example of Memory Device for 8-bit Micro Computer

(8-bit)Address

Decoder3 2 1 0

Data


148

Cache Memory[Wikipedia definition] A cache is a component that improves

performance by transparently storing data such that future requests for that data can be served faster

Purpose for use: high-speed memory which is installed

between CPU and memory to minimize the CPU waiting time because of the speed difference between CPU and memory.

Characteristics Use of memory chips which have a higher

access speed than that of main memory Small capacity because of the price and

limited space

CPU

Main Memory

Cache


149

Cache Memorycache hit : data which CPU wants to access already exists in cachecache miss : data which CPU wants to access doesn’t exist in cacheCache hit ratio(H) :

The ratio(or percentage) of accesses that result in cache hits is known as the hit ratio of the cache

number of times to be hit to cache H = -------------------------------------- number of times of total memory access

Cache miss ratio = (1 - H)Average access time of memory device (Ta) :

Ta = H × Tc + (1 - H) × Tm

Tc: cache access time, Tm: main memory access time


Computer Architecture: Cache Memory

Lecture #12


So you want fast?

It is possible to build a computer which uses only static RAM (see later).

This would be very fast.

This would need no cache.—How can you cache cache?

This would cost a very large amount.


Locality of Reference

During the course of the execution of a program, memory references tend to cluster.

e.g. loops


CacheSmall amount of fast memorySits between normal main memory and CPUMay be located on CPU chip or module

CPU Cache Main Memory

Word Transfer Block Transfer


Cache operation - overviewCPU requests contents of memory location.Check cache for this data.If present, get from cache (fast).If not present, read required block from main memory to cache.Then deliver from cache to CPU.Cache includes tags to identify which block of main memory is in each cache slot.


Size does matter

Cost—More cache is expensive.

Speed—More cache is faster (up to a point).—Checking cache for data takes time.


Typical Cache Organization


Mapping Function

Cache of 64kByte

Cache block of 4 bytes—i.e. cache is 16k (214) lines of 4 bytes

16MBytes main memory

24 bit address —(224=16M)


Direct MappingEach block of main memory maps to only one cache line.—i.e. if a block is in cache, it must be in one specific

place

Address is in two parts.Least Significant w bits identify unique word.Most Significant s bits specify one memory block.

The MSBs are split into a cache line field r and a tag of s-r (most significant).


Direct Mapping-Address Structure

Tag Field (t) Slot Field (s) Word Field(w)

8 14 2

• 24 bit address• 2 bit word identifier (4 byte block)• 22 bit block identifier

—8 bit tag (=22-14)—14 bit slot or line

• No two blocks in the same line have the same Tag field.• Check contents of cache by finding line and checking

Tag.

159

Direct Mapping - Cache Slot Table

Cache Slot Main Memory blocks held0 0, m, 2m, 3m…2s-m1 1,m+1, 2m+1…2s-m+1

m-1 m-1, 2m-1,3m-1…2s-1


Direct Mapping Cache Organization

Memory Address

Comparator

Cache

Data

Tag

Slot(0)

Slot(i)

Slot(m-1)

Tag Slot Word

(Cache hit)

(Cache miss)

Main Memory


Direct Mapping SummaryAddress length = (t+ s + w) bits

Number of addressable units = 2s+w words or bytes

Block size = 2w words or bytes

Number of blocks in main memory = 2t+s+w/2w = 2t+s

Number of slots in cache = m = 2s

Size of tag = t bits


Direct Mapping Characteristics

Simple

Inexpensive

Fixed location for given block—If a program accesses 2 blocks that map to the

same line repeatedly, cache misses are very high.


Associative Mapping

A main memory block can load into any line of cache.

Memory address is interpreted as tag and wordTag uniquely identifies block of memory.

Every line’s tag is examined for a match.

Cache searching gets expensive.


Fully Associative Cache Organization

Tag Field Word Field

Memory Address

Comparator

Cache

DataTag

Slot(0)

Slot(i)

Slot(m-1)

Tag Word

(Cache hit)

(Cache miss)

Main Memory


Associative Mapping ExampleAddress

Tag Word Data

Main Memory(128 bytes)

5 bits 32bitsCache(32 bytes)

Tag data slot #


Tag 5 bit Word2 bit

Associative Mapping-Address Structure

5 bit tag stored with each 32 bit block of dataCompare tag field with tag entry in cache to check for hitLeast significant 2 bits of address identify which 16 bit word is required from 32 bit data block


Associative Mapping SummaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesSlot size = 2w words or bytesNumber of tags in main memory = 2t+ w/2w = 2t

Number of slots in cache = undeterminedSize of tag = t bits


Set Associative Mapping

Cache is divided into a number of sets.Each set contains a number of lines.A given block maps to any line in a given set.—e.g. Block B can be in any line of set i.

e.g. 2 lines per set—2 way associative mapping—A given block can be in one of 2 lines in only one set.

Tag Field Set Field Word Field


Set Associative Mapping Example

Tag Set Word

23 2

Memory Address

Comparator

Cache

DataTag

Slot(0)

Slot(1)

Set(i)

Set(0)Tag Set Word

(Cache hit)

(Cache miss)

Main Memory

Slot(0)

Slot(1)

Slot(0)

Slot(1)

Set(m-1)


Set Associative Mapping -Address Structure

Use set field to determine cache set to look in.Compare tag field to see if we have a hit.e.g—Address Tag Data Set number—1FF 7FFC 1FF 12345678 1FFF—001 7FFC 001 11223344 1FFF

Tag 9 bit Set 13 bitWord2 bit


Set Associative Mapping SummaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytesNumber of blocks in main memory = 2d

Number of lines in set = kNumber of sets = v = 2d

Number of lines in cache = kv = k * 2d

Size of tag = (s – d) bits


Pentium 4 Cache80386 – no on chip cache80486 – 8k using 16 byte lines and four way set associative organizationPentium (all versions) – two on chip L1 caches— Data & instructions

Pentium 4 – L1 caches— 8k bytes— 64 byte lines— four way set associative

L2 cache — Feeding both L1 caches— 256k— 128 byte lines— 8 way set associative


Pentium 4 Core ProcessorFetch/Decode Unit— Fetches instructions from L2 cache— Decode into micro-ops— Store micro-ops in L1 cache

Out of order execution logic— Schedules micro-ops— Based on data dependence and resources— May speculatively execute

Execution units— Execute micro-ops— Data from L1 cache— Results in registers

Memory subsystem— L2 cache and systems bus


Pentium 4 DesignDecodes instructions into RISC like micro-ops before L1 cache

Micro-ops fixed length— Superscalar pipelining and scheduling

Pentium instructions long & complex

Performance improved by separating decoding from scheduling & pipelining— (More later – ch14)

Data cache is write back— Can be configured to write through

L1 cache controlled by 2 bits in register— CD = cache disable— NW = not write through— 2 instructions to invalidate (flush) cache and write back then invalidate


DRAM

1. Synchronous DRAM (SDRAM)— Add a clock signal to DRAM interface, so that the

repeated transfers would not bear overhead to synchronize with DRAM controller

2. Double Data Rate (DDR SDRAM)— Transfer data on both the rising edge and falling edge of

the DRAM clock signal doubling the peak data rate— DDR2 lowers power by dropping the voltage from 2.5 to

1.8 volts + offers higher clock rates: up to 400 MHz— DDR3 drops to 1.5 volts + higher clock rates: up to 800

MHz

Improved Bandwidth, not Latency


DRAM

StandardClock Rate

(MHz)M transfers /

second DRAM NameMbytes/s/

DIMMDIMM

Name

DDR 133 266 DDR266 2128 PC2100

DDR 150 300 DDR300 2400 PC2400

DDR 200 400 DDR400 3200 PC3200

DDR2 266 533 DDR2-533 4264 PC4300

DDR2 333 667 DDR2-667 5336 PC5300

DDR2 400 800 DDR2-800 6400 PC6400

DDR3 533 1066 DDR3-1066 8528 PC8500

DDR3 666 1333 DDR3-1333 10664 PC10700

DDR3 800 1600 DDR3-1600 12800 PC12800

x 2 x 8177Computer Architecture

Error Correction

Motivation:—Failures/time proportional to number of bits!—As DRAM cells shrink, more vulnerable

Went through period in which failure rate was low enough without error correction that people didn’t do correction—DRAM banks too large now—Servers always corrected memory systems

Basic idea: add redundancy through parity bits—Common configuration: Random error correction

– SEC-DED (single error correct, double error detect)– One example: 64 data bits + 8 parity bits (11% overhead)

—Really want to handle failures of physical components as well– Organization is multiple DRAMs/DIMM, multiple DIMMs– Want to recover from failed DRAM and failed DIMM!– “Chip kill” handle failures width of single DRAM chip


Date post:	17-Jan-2016
Category:	Documents
Upload:	mariah-hubbard
View:	219 times
Download:	0 times

Computer Architecture Chang-Bum Lee Dept. of Computer Engineering Youngsan University Computer...

Documents