Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | mariah-hubbard |
View: | 219 times |
Download: | 0 times |
Computer Architecture
Chang-Bum Lee
Dept. of Computer EngineeringYoungsan University
Computer Architecture 1
Course Content(1)
2Computer Architecture
Lecture #1 — Course Overview
Course ContentsCourse Schedule\Grading GuidelinesTest and Assignments
Lecture #2 — Basic Architecture of Computer
Basic Architecture System Configuration
Lecture # 3 — Instruction Execution
Fetch CycleExecution CycleInterrupt Cycle
Course Content(2)
Lecture #4, 5— Instruction Set
Program ControlInstruction Formats Addressing ModesPentium Processors
Lecture #6, 7— Arithmetic and Logical Operations
Arithmetic and Logical UnitInteger RepresentationLogic Operations Shift OperationsArithmetic Operations of Integer
(Addition, Subtraction, Multiplication, and Division)
3Computer Architecture
Course Content(3)Lecture #8, 9
— Real NumbersRepresentation of Floating Point NumbersArithmetic Operations of Floating Point Numbers
(Addition, Subtraction, Multiplication, and Division)
Lecture #10— Control Unit
Structure of Control UnitMicroinstructionMicroprogram
Lecture #11— Memory Devices
Memory HierarchyRAMROMDesign of Memory Device Modules
4Computer Architecture
Course Content(4)Lecture #12
— Cache MemoryCache SizeFetch MethodMapping
5Computer Architecture
Computer Architecture: Course Overview
Lecture #1
Computer Architecture 6
Course Objectives
7
Understand role & relationship of hardware and software
Exposure to. . .— Machine organization— Assembly language programming— C programming
Able to actually build entire (slow) computing system—Hardware and software
Be distinguished from mere programmers
Computer Architecture
Course Schedule The complete course, including Lectures and Seminars, will be covered in 90 hours(15 weeks).
The total duration of the course will be 4 months.
Lecture 3 hours (2 hours + 1 hour) weekly
8Computer Architecture
Grading Guidelines
Attendance : 20%— Depending on students class participation
Final Exam : 40%— Textbook based in class final exam
Midterm Exam : 30%— Textbook based in class mid-term exam
Assignments : 10%— Based on submitting assignments
9Computer Architecture
Course References
Computer Architecture Computer Architecture/Jong-Hyun Kim
By Sang Lung Publishing Corp.
The course slides will be available at
http://prof.ysu.ac.kr/blog/postlist.asp?b_id=cblee
10Computer Architecture
Course Summary
11
Introduction to computer architecture— How is data represented?— What are the pieces of a computer?— How do computers work?
Programming— How do I "talk" directly to the machine?— How do I program in C?
Computer Systems and Computation— How do simple HW/SW elements come together to realize
complex computations?
Computer Architecture
Computer Architecture: Basic Architecture
Lecture #2
12Computer Architecture
Introduction - Architecture (1)Architecture is those attributes visible to the programmer—Instruction set, number of bits used for data
representation, I/O mechanisms, addressing techniques.
—e.g. Is there a multiply instruction?
Organization is how features are implemented—Control signals, interfaces, memory
technology.—e.g. Is there a hardware multiply unit or is it
done by repeated addition?
13Computer Architecture
Introduction - Architecture (2)All Intel x86 family share the same basic architecture.
The IBM System/370 family share the same basic architecture.
This gives code compatibility.—At least backwards
Organization differs between different versions.
14Computer Architecture
Structure & FunctionStructure is the way in which components relate to each other.
Function is the operation of individual components as part of the structure.
All computer functions are:—Data processing—Data storage—Data movement—Control
15Computer Architecture
ENIACElectronic Numerical Integrator And ComputerEckert and Mauchly in University of PennsylvaniaTrajectory tables for weapons Started 1943, Finished 1946— Too late for war effort
Used until 1955
— Decimal (not binary)— 20 accumulators of 10 digits— Programmed manually by switches— 18,000 vacuum tubes, 30 tons— 15,000 square feet— 140 kW power consumption— 5,000 additions per second
16Computer Architecture
Structure of von Neumann Machine
Stored Program conceptMain memory storing programs and dataALU operating on binary dataControl unit interpreting instructions from memory and executingInput and output equipment operated by control unitPrinceton Institute for Advanced Studies —IAS
Completed 1952
17Computer Architecture
Transistor Based ComputersTransistors — Replaced vacuum tubes— Smaller— Cheaper— Less heat dissipation— Solid State device— Made from Silicon (Sand)— Invented 1947 at Bell Labs— William Shockley et al.
Transistor based computers— Second generation machines— NCR & RCA produced small transistor machines— IBM 7000, DEC - 1957— Produced PDP-1
18Computer Architecture
Speeding It Up & Performance Mismatch
Speeding it up — Pipelining— On board cache(L1 & L2 cache)— Branch prediction— Data flow analysis— Speculative execution
Performance Mismatch— Processor speed increased— Memory capacity increased— Memory speed lags behind processor speed
19Computer Architecture
SolutionsIncrease number of bits retrieved at one time.— Make DRAM “wider” rather than “deeper”
Change DRAM interface.— Cache
Reduce frequency of memory access.— More complex cache and cache on chip
Increase interconnection bandwidth.— High speed buses— Hierarchy of buses
20Computer Architecture
Program ConceptHardwired systems are inflexible.
General purpose hardware can do different tasks, given correct control signals.
Instead of re-wiring, supply a new set of control signals.
A sequence of steps— For each step, an arithmetic or logical operation is done.— For each operation, a different set of control signals is needed.
21Computer Architecture
Computer ComponentsThe Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit.
Data and instructions need to get into the system and results out.—Input/output
Temporary storage of code and results is needed.—Main memory
22Computer Architecture
Computer Architecture: CPU Structures and Functions
Lecture #3
23Computer Architecture
CPU StructureCPU must: Fetch instructions Interpret instructions Fetch data, process data, and write data
Registers CPU must have some working space
(temporary storage) Number and function vary between processor designs One of the major design decisions Top level of memory hierarchy
Control Unit Control unit coordinates sequence of execution steps
ALU ALU performs arithmetic and logical processing
Registers
ALU
ControlUnit
CPU Internal Bus
AddressBus
DataBus
ControlBus
System Bus
24Computer Architecture
CPU Structure
Instruction Set
Software
Hardware
25Computer Architecture
Fetch Cycle(1)Program Counter (PC) holds address of next instruction to fetch.
Processor fetches instruction from memory location pointed to by PC.
Increment PC— Unless told otherwise
Instruction loaded into Instruction Register (IR)– to: MAR <- PC
– t1: MBR <-M[MAR], PC <- PC+1
– t2: IR <-MBR
Processor interprets instruction and performs required actions
26Computer Architecture
Fetch Cycle(2)
Micro operation– to: MAR <- PC
– t1: MBR <-M[MAR], PC <- PC+1
– t2: IR <-MBR
Address and Instruction Flow in fetch cycle
MemoryDevices
AddressBus
ControlBus
DataBus
ControlUnit
27Computer Architecture
Execute Cycle(1)Processor-memory— data transfer between CPU and main memory
Processor I/O— Data transfer between CPU and I/O module
Data processing— Some arithmetic or logical operation on data
Control— Alteration of sequence of operations— e.g. jump
Combination of above
28Computer Architecture
Execute Cycle(2)Example—LOAD addr :
– to: MAR <- IR(addr)
– t1: MBR <-M[MAR]
– t2: AC <-MBR
—STA addr—ADD addr
ControlUnit
AddressBus
ControlBus
DataBus
MemoryDevices
29Computer Architecture
Interrupt Cycle
Added to instruction cycleProcessor checks for interrupt—Indicated by an interrupt signal
If no interrupt, fetch next instructionIf interrupt pending:—Suspend execution of current program —Save context—Set PC to start address of interrupt handler routine—Process interrupt—Restore context and continue interrupted program
30Computer Architecture
Multiple Interrupts(1)
Disable interrupts—Processor will ignore further interrupts while
processing one interrupt—Interrupts remain pending and are checked
after first interrupt has been processed—Interrupts handled in sequence as they occur
31Computer Architecture
Multiple Interrupts(2)
Main Program
• Define priorities- Low priority interrupts can be interrupted by higher priority interrupts- When higher priority interrupt has been processed, processor returns to previous interrupt
32Computer Architecture
Indirect Cycle
May require memory access to fetch operandsIndirect addressing requires more memory accessesCan be thought of as additional instruction subcycle
33Computer Architecture
Prefetch
Fetch accessing main memoryExecution usually does not access main memoryCan fetch next instruction during execution of current instructionCalled instruction prefetch
34Computer Architecture
Improved Performance
But not doubled:—Fetch usually shorter than execution
– Prefetch more than one instruction?
—Any jump or branch means that prefetched instructions are not the required instructions
Add more stages to improve performance
35Computer Architecture
Pipelining
Fetch instructionDecode instructionCalculate operands (i.e. EAs)Fetch operandsExecute instructionsWrite result
Overlap these operations
36Computer Architecture
Two Stage Instruction Pipeline
Fetch Execute
Instruction ResultInstruction
(a) Simplified View
Fetch Execute
Instruction ResultInstruction
(b) Expanded View
Discard
New AddressWait Wait
37Computer Architecture
Memory Connection
Receives and sends dataReceives addresses (of locations)Receives control signals —Read—Write—Timing
38Computer Architecture
Input/Output ConnectionSimilar to memory from computer’s viewpointOutput—Receive data from computer—Send data to peripheral
Input—Receive data from peripheral—Send data to computer
Receive control signals from computerSend control signals to peripherals—e.g. spin disk
Receive addresses from computer—e.g. port number to identify peripheral
Send interrupt signals (control)39Computer Architecture
CPU ConnectionReads instruction and dataWrites out data (after processing)Sends control signals to other unitsReceives (& acts on) interrupts
Buses—There are a number of possible interconnection systems—Single and multiple BUS structures are most common—e.g. Control/Address/Data bus (PC)—e.g. Unibus (DEC-PDP)
40Computer Architecture
What is a Bus?A communication pathway connecting two or more devices
Usually broadcast
Often grouped—A number of channels in one bus—e.g. 32 bit data bus is 32 separate single bit channels.
Power lines may not be shown
41Computer Architecture
Data Bus and Address BusData Bus—Carries data
– Remember that there is no difference between “data” and “instruction” at this level
—Width is a key determinant of performance– 8, 16, 32, 64 bit
Address Bus—Identify the source or destination of data—e.g. CPU needs to read an instruction (data) from a
given location in memory—Bus width determines maximum memory capacity of
system– e.g. 8080 has 16 bit address bus giving 64k address space
42Computer Architecture
Control Bus
Control and timing information—Memory read/write signal—Interrupt request—Clock signals
43Computer Architecture
Single Bus ProblemsLots of devices on one bus leads to:—Propagation delays
– Long data paths mean that co-ordination of bus use can adversely affect performance.
– If aggregate data transfer approaches bus capacity.
Most systems use multiple buses to overcome these problems.
44Computer Architecture
Bus Types and ArbitrationBus Types—Dedicated
– Separate data & address lines—Multiplexed
– Shared lines– Address valid or data valid control line– Advantage - fewer lines– Disadvantages
+ More complex control+ Ultimate performance
Bus Arbitration— More than one module controlling the bus— e.g. CPU and DMA controller— Only one module may control bus at one time— Arbitration may be centralised or distributed
45Computer Architecture
TimingCo-ordination of events on bus
Synchronous—Events determined by clock signals—Control Bus includes clock line—A single 1-0 is a bus cycle—All devices can read clock line—Usually sync on leading edge—Usually a single cycle for an event
Asynchronous—Read, Write
46Computer Architecture
Memory Hierarchy & Physical TypesMemory Hierarchy—Registers
– Exist In CPU—Internal or Main memory
– May include one or more levels of cache– Mainly “RAM”
—External memory– Backing store
Physical Types—Semiconductor types are mainly RAM—Magnetic types are Disk & Tape—Optical types are CD & DVD—Others are Bubble, Hologram, etc.
47Computer Architecture
Performance
Access time—Time between presenting the address and
getting the valid data
Memory Cycle time—Time may be required for the memory to
“recover” before next access.—Cycle time is access + recovery.
Transfer Rate—Rate at which data can be moved.
48Computer Architecture
Instruction RepresentationIn machine code each instruction has a unique bit pattern.
For human consumption (well, programmers anyway) a symbolic representation is used.—e.g. ADD, SUB, LOAD
Operands can also be represented in this way.—ADD A,B
49Computer Architecture
Computer Architecture: Instruction Types and Addressing Modes
Lecture #4, #5
50Computer Architecture
Instruction Format and Types
4 bits
Opcode Operand Reference Operand Reference
6 bits 6 bits
16 bits
Simple Instruction Format
Instruction TypesData processingData storage (main memory)Data movement (I/O)Program flow control
51Computer Architecture
Number of Addresses (1)
3 addresses—Operand 1, Operand 2, Result—a = b + c;—May be a forth - next instruction (usually
implicit)—Not common—Needs very long words to hold everything
52Computer Architecture
Number of Addresses (2)
2 addresses—One address doubles as operand and result.—a = a + b—Reduces length of instruction—Requires some extra work
– Temporary storage to hold some results
1 address—Implicit second address—Usually a register (accumulator)—Common on early machines
53Computer Architecture
Number of Addresses (3)
0 (zero) addresses—All addresses implicit—Uses a stack—e.g.
– push a– push b– add– pop c
—c = a + b
54Computer Architecture
Design Decisions (1)Operation repertoire—How many ops?—What can they do?—How complex are they?
Data types
Instruction formats—Length of op code field—Number of addresses
55Computer Architecture
Addressing Modes
ImmediateDirectIndirectRegisterRegister IndirectDisplacement (Indexed) Stack
56Computer Architecture
Immediate Addressing
Operand is part of instructionOperand = address fielde.g. ADD 5—Add 5 to contents of accumulator—5 is operand
No memory reference to fetch dataFastLimited range
57Computer Architecture
Immediate Addressing Diagram
OperandOpcode
Instruction
58Computer Architecture
Direct Addressing
Address field contains address of operand.Effective address (EA) = address field (A)e.g. ADD A—Add contents of address A to accumulator
Single memory reference to access dataNo additional calculations to work out effective addressLimited address space
59Computer Architecture
Direct Addressing Diagram
MemoryAddress AOpcode
Instruction
Operand
60Computer Architecture
Indirect AddressingMemory cell pointed to by address field contains the address of (pointer to) the operand.EA = (A)—Look in A, find address (A) and look there for operand.
e.g. ADD (A)—Add contents of cell pointed to by contents of A to
accumulator.
Large address space 2n where n = word lengthMay be nested, multilevel, cascaded—e.g. EA = (((A))) Draw the diagram yourself
Multiple memory accesses to find operandHence slower
61Computer Architecture
Indirect Addressing Diagram
Address AOpcode
Instruction
Memory
Operand
Pointer to operand
62Computer Architecture
Register Addressing (1)Operand is held in register named in address filed.
EA = R
Limited number of registers
Very small address field needed —Shorter instructions—Faster instruction fetch
63Computer Architecture
Register Addressing (2)
No memory accessVery fast executionVery limited address spaceMultiple registers helps performance—Requires good assembly programming or
compiler writing—N.B. C programming
– register int a;
c.f. Direct addressing
64Computer Architecture
Register Addressing Diagram
Register Address ROpcode
Instruction
Registers
Operand
65Computer Architecture
Register Indirect Addressing
C.f. indirect addressingEA = (R)Operand is in memory cell pointed to by contents of register RLarge address space (2n)One fewer memory access than indirect addressing
66Computer Architecture
Register Indirect Addressing Diagram
Register Address ROpcode
Instruction
Memory
OperandPointer to Operand
Registers
67Computer Architecture
Displacement Addressing
EA = A + (R)Address field hold two values—A = base value—R = register that holds displacement—or vice versa
68Computer Architecture
Displacement Addressing Diagram
Register ROpcode
Instruction
Memory
OperandPointer to Operand
Registers
Address A
+
69Computer Architecture
Relative Addressing
A version of displacement addressingR = Program counter, PCEA = A + (PC)i.e. get operand from A cells from current location pointed to by PCc.f locality of reference & cache usage
70Computer Architecture
Base-Register Addressing
A holds displacementR holds pointer to base addressR may be explicit or implicite.g. segment registers in 80x86
71Computer Architecture
Indexed Addressing
A = baseR = displacementEA = A + RGood for accessing arrays—EA = A + R—R++
72Computer Architecture
Combinations
PostindexEA = (A) + (R)
PreindexEA = (A+(R))
(Draw the diagrams)
73Computer Architecture
Stack Addressing
Operand is (implicitly) on top of stacke.g. —ADD Pop top two items from stack
and add
74Computer Architecture
Pentium Addressing ModesVirtual or effective address is offset into segment.— Starting address plus offset gives linear address.— This goes through page translation if paging enabled.
12 addressing modes available— Immediate— Register operand— Displacement— Base— Base with displacement— Scaled index with displacement— Base with index and displacement— Base scaled index with displacement— Relative
75Computer Architecture
Instruction Types
Instruction generally four types.—Data processing—Data storage (main memory)—Data movement (I/O)—Program flow control
76Computer Architecture
Design Decisions (1)
Operation repertoire—How many ops?—What can they do?—How complex are they?
Data types
Instruction formats—Length of op code field—Number of addresses
77Computer Architecture
Design Decisions (2)
Registers—Number of CPU registers available—Which operations can be performed on which
registers?
Addressing modes (later…)
RISC v CISC
78Computer Architecture
Types of Operation
There are several types of operations as follows.—Data Transfer—Arithmetic—Logical—Conversion—I/O—System Control—Transfer of Control
79Computer Architecture
ArithmeticArithmetic operations include Add, Subtract, Multiply, Divide.
Can use signed integer.
Can arithmetic operations process floating point ?
May include.—Increment (a++)—Decrement (a--)—Negate (-a)
80Computer Architecture
Shift and Rotate Operations
Logical right shiftLogical left shiftArithmetic right shiftArithmetic left shiftRight rotateLeft rotate
81Computer Architecture
Logical and Conversion
Logical—Has bitwise operations.—Logical operations are AND, OR, NOT, etc.
Conversion—E.g. Binary to Decimal
82Computer Architecture
Input/Output
May be specific instructions.May be done using data movement instructions. (memory mapped)May be done by a separate controller (DMA).
83Computer Architecture
Transfer of Control
Branch—e.g. branch to x if result is zero
Skip—e.g. increment and skip if zero—ISZ Register1: Skip if zero—Branch xxxx
Subroutine call—c.f. interrupt call: jump to interrupt service
routine
84Computer Architecture
Branch Instruction
Unconditional Branch—Jump to 211 unconditionally.
Conditional Branch 1—Jump to 211 if accumulator is zero.
Conditional Branch 2—Jump to 235 if R1 equals to R2.
85Computer Architecture
Nested Procedure Calls
If a main program calls procedure 1, it goes to Proc.1 and it’s procedure is processed.
If the Proc.1 calls another procedure(Proc.2), it goes to Proc.2 and it’s procedure is processed.
If Proc.2 meets RETURN instruction, it returns to Proc.1.
86Computer Architecture
Computer Architecture: Arithmetic and Logical Operations of Computer
Lecture #6, #7
87Computer Architecture
Arithmetic & Logic UnitDoes the calculations.
Everything else in the computer is there to service this unit.
Handles integers.
May handle floating point (real) numbers.
May be separate FPU (maths co-processor).
88Computer Architecture
Integer Representation
Only have 0 & 1 to represent everythingPositive numbers stored in binary—e.g. 41=00101001
Has no minus sign Has no periodHas sign-magnitudeUse one’s or two’s compliment
89Computer Architecture
Sign-Magnitude
Left most bit is sign bit.0 means positive.1 means negative.+18 = 00010010 -18 = 10010010Problems—Need to consider both sign and magnitude in
arithmetic—Two representations of zero (+0 and -0)
90Computer Architecture
Two’s Compliment+3 = 00000011, +2 = 00000010+1 = 00000001, +0 = 00000000 -1 = 11111111, -2 = 11111110 -3 = 11111101
Benefits—Two’s compliment has one representation of
zero.—Arithmetic works easily (see later).—Negating is fairly easy.
– 3 = 00000011– Boolean complement gives 11111100– Add 1 to LSB 11111101
91Computer Architecture
Logical Operations
AND, OR, XOR, NOTSelective-set, Selective-complementMasking, Insert, Compare
Bitwise operations—Logical Shift—Circular Shift— Arithmetic Shift— Shift with Carry
92Computer Architecture
Shift and Rotate Operations
93Computer Architecture
Addition and SubtractionNormal binary addition
Monitor sign bit for overflow
Take two’s compliment of substahend and add to minuend.—i.e. a - b = a + (-b)
So we only need addition and complement circuits.
94Computer Architecture
Hardware for Addition and Subtraction
OF: overflow bitSW: Switch (select addition or subtraction)
B Register
Complementer
SW
Adder
A Register
OF
95Computer Architecture
Multiplication
Is complexWork out partial product for each digitTake care with place value (column)Add partial products
96Computer Architecture
Multiplication Example
1011 Multiplicand (11 dec)
x 1101 Multiplier (13 dec) 1011 Partial products 0000 1011 1011 10001111 Product (143 dec)
Note: if multiplier bit is 1, copy multiplicand (place value),
otherwise zeroNote: need double length result
97Computer Architecture
Booth’s AlgorithmSTART
A←0, Q-1 ← 0M ← MultiplicandQ ← MultiplierCounter ← n
Q0, Q-1
Arithmetic Shift Right of A, Q, Q-1
Counter ← Counter-1
Counter=0? END
A← A + MA← A - M
= 01= 10
YesNo
= 11= 00
98Computer Architecture
DivisionMore complex than multiplicationNegative numbers are really bad!Based on long divisionDivision of Unsigned Binary Integers
001111
1011
00001101
100100111011001110
1011
1011100
Quotient
Dividend
Remainder
PartialRemainders
Divisor
99Computer Architecture
Computer Architecture: Real Numbers
Lecture #8, #9
100Computer Architecture
Real NumbersNumbers with fractions
Could be done in pure binary—1001.1010 = 24 + 20 +2-1 + 2-3 =9.625
Where is the binary point?
Fixed?—Very limited
Moving?—How do you show where it is?
101Computer Architecture
Floating Point
+/- .significand x 2exponent
Point is actually fixed between sign bit and body of mantissa.Exponent indicates place value (point position).
Sig
n bi
t
BiasedExponent
Mantissa
102Computer Architecture
Floating Point Examples
S E field Mantissa field
(b) Examples of a data representation
Sign(S) bit = 0 Exponent(E) field = 00000101 Mantissa(M) field = 1101 0000 0000 0000 0000 0000
1 bit 8 bits 23 bits
(a) 32-bit floating point format
0 000000101 11010000 00000000 0000000
103Computer Architecture
Signs for Floating Point
Mantissa is stored in 2s complement.Exponent is in excess or biased notation.—e.g. Excess (bias) 128 means—8 bit exponent field—Pure value range 0-255—Subtract 128 to get correct value—Range -128 to +127
104Computer Architecture
Normalization
FP numbers are usually normalized.i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1.Since it is always 1 there is no need to store it.c.f. Scientific notation where numbers are normalized to give a single digit before the decimal point. e.g. 3.123 x 103
105Computer Architecture
FP Ranges
For a 32 bit number—8 bit exponent —+/- 2256 1.5 x 1077
Accuracy—The effect of changing lsb of mantissa—23 bit mantissa 2-23 1.2 x 10-7
106Computer Architecture
Expressible Numbers
107Computer Architecture
IEEE 754Standard for floating point storage32 and 64 bit standards8 and 11 bit exponent respectively
108Computer Architecture
Floating Point ArithmeticFP Arithmetic +/-—Check for zeros—Align significands (adjusting exponents)—Add or subtract significands—Normalize result
FP Arithmetic x/—Check for zero—Add/subtract exponents —Multiply/divide significands (watch sign)—Normalize—Round—All intermediate results should be in double length storage
109Computer Architecture
Floating Point Multiplication
110Computer Architecture
Computer Architecture: Control Unit
Lecture #10
111Computer Architecture
112
Functions of control unit— Decoding of an instruction code— Generation of control signals for instruction execution
Micro-instruction : — Control word
Micro-program : — Set of micro-instructions
Routine— Groups of micro-instructions for special functions
of CPU ex. Fetch cycle routine, Execution cycle routine,
Interrupt cycle routine
Control Unit
Computer Architecture
113
Structure of Control Unit
Configuration elements— Instruction decoder
— Control address register: CAR
— Control memory) : Internal Memory to store the micro programs
— control buffer register: CBR
— subroutine register: SBR
— sequencing module
Computer Architecture
114
Internal Structure of Control Unit
Instruction Register
Instruction Decoder
CAR
Control Memory Device
CBR
Decoder
SBRSequencing
Module
External Control Signals
Internal Control Signals
Condition Flags
Computer Architecture
115
Internal Structure of the Control Memory Device
Example— Capacity of CMD = 512
words— The first half (Address 0 ~
63) : Store common routines
— The second half (Address 64 ~ 127) : Store execution routines of each instruction
Fetch Cycle Routine
Indirect Cycle Routine
Interrupt Cycle Routine
Execution Cycle Routine 1
Execution Cycle Routine 2
0...
6364
127
Computer Architecture
116
Mapping
Instruction Code
Mapping Function
Computer Architecture
117
Binary Codes and Symbols for Micro Operations(Examples)
Code Micro-operation Symbol 000 None NOP 001 MAR PC+1 PCTAR 010 MAR IR(addr) IRTAR 011 AC AC+MBR ADD 100 MBR M[MAR] READ 101 AC MBR BRTAC 110 IR MBR BRTIR 111 M[MAR] MBR WRITE
• Op field 1
Computer Architecture
118
Op field 2
Code Micro-operation Symbol 000 None NOP 001 PC PC+1 INCPC 010 MBR AC ACTBR 011 MBR PC PCTBR 100 PC MBR BRTPC 101 MAR SP SPTAR 110 AC AC-MBR SUB 111 PC IR(addr) IRTPC
Binary Codes and Symbols for Micro Operations(Examples)
Computer Architecture
119
Micro-programming
Fetch Cycle Routine ORG O
FETCH: PCTAR U JMP NEXT ; MAR <-PC Execution of next instruction
READ, INCPC U JMP NEXT ; BR <-M[MAR], PC =PC+1
Execution of next instruction
BRTIR U MAP; IR<-MBR Branch to the execution cycle
Binary Bit Pattern
Computer Architecture
120
Indirect Cycle RoutineMicro instruction routine of the indirect cycle
Binary Bit Pattern
Execution of next instruction
Execution of next instruction
Return to the execution cycle
Computer Architecture
121
Execution Cycle Routine
Instruction Op code Staring address of the routine
Computer Architecture
122
Execution Cycle Routines for each instruction
; Call the indirect cycle routine if I=1
; Call the indirect cycle routine if I=1
Computer Architecture
Computer Architecture: Memory Devices
Lecture #11
123Computer Architecture
Memory Classification• Main memory :
—Internal memory
• Auxiliary storage device— External memory
124Computer Architecture
Memory Hierarchy
Registers—In CPU
Internal or Main memory—May include one or more
levels of cache—“RAM”
External memory—Backing store
125Computer Architecture
Semiconductor Memory Types
126Computer Architecture
Semiconductor Memory
RAM —Misnamed as all semiconductor memory is
random access—Read/Write—Volatile—Temporary storage—Static or dynamic
127Computer Architecture
Memory Cell Operation
(a) Write (b) Read
CellSelect
Control
Data InCellSelect
Control
Sense
128Computer Architecture
Dynamic RAMBits stored as charge in capacitorsCharges leakNeed refreshing even when poweredSimpler constructionSmaller per bitLess expensiveNeed refresh circuitsSlowerMain memoryEssentially analogue—Level of charge determines value
129Computer Architecture
Refreshing
Refresh circuit included on chipDisable chipCount through rowsRead & Write backTakes timeSlows down apparent performance
130Computer Architecture
Dynamic RAM Structure
Address Line
Bit Line B
Transistor
Storage Capacitor
Ground
131Computer Architecture
DRAM OperationAddress line active when bit read or written—Transistor switch closed (current flows)
Write—Voltage to bit line
– High for 1 low for 0—Then signal address line
– Transfers charge to capacitor
Read—Address line selected
– transistor turns on—Charge from capacitor fed via bit line to sense amplifier
– Compares with reference value to determine 0 or 1—Capacitor charge must be restored
132Computer Architecture
Typical 16 Mb DRAM (4M x 4)
133Computer Architecture
Static RAMBits stored as on/off switchesNo charges to leakNo refreshing needed when poweredMore complex constructionLarger per bitMore expensiveDoes not need refresh circuitsFasterCacheDigital—Uses flip-flops
134Computer Architecture
Static RAM Structure
dc voltage
Ground
Address Line Bit Line BBit Line B
T3 T4
T5 C1C2
T1 T2
T6
135Computer Architecture
SRAM and DRAMBoth volatile—Power needed to preserve data
DRAM —Simpler to build, smaller—More dense—Less expensive—Needs refresh—Larger memory units
SRAM—Faster—Used in cache
136Computer Architecture
Read Only Memory (ROM)
Permanent storage—Nonvolatile
Microprogramming (see later)
Library subroutines
Systems programs (BIOS)
Function tables
137Computer Architecture
Types of ROMWritten during manufacture—Very expensive for small runs
Programmable (once)—PROM—Needs special equipment to program
Read “mostly”—Erasable Programmable (EPROM)
– Erased by UV
—Electrically Erasable (EEPROM)– Takes much longer to write than read
—Flash memory– Erase whole memory electrically
138Computer Architecture
Packaging
139Computer Architecture
140
Design of Memory Device Module
[Example] Design of 1Kx32 bit memory device module using 1K×8 bit RAM chips
– Method : parallel connection of 4 RAM chips
– Capacity of module: (1K×8) × 4 = 1K×32 bits = 1K words
– Address bits(10 bits: A9∼A0) : Common connection to all chips
– Address area: 000H ∼ 3FFH (H: Hexadecimal)
– Data Store: 8 bits/chip
Computer Architecture
141
Design of 1K×32 bits Memory Device Module
Address(A9-0)
Data Bus(32 bits)
Computer Architecture
142
Design of Memory Device Module(con’t)
[Example] Design of 4Kx8 bit memory device module using 1K×8 bit RAM chips
– Method : serial connection of 4 RAM chips– Capacity of module: (1K×8) × 4 = 4K×8 bits
= 4K bytes
– Address bits(12 bits: A11∼A0) :+ upper 2 bits : generation of 4 chip select
signals using address decoder + lower 10 bits : common connection to all chips
– Address area: 000H ∼ FFFH (H: Hexadecimal)
– Data Store: 8 bits/address
Computer Architecture
143
Design of 4K×8 bits Memory Device Module
2×4 Decoder
Data(D7-0)
Computer Architecture
144
Address Areas of each RAMRAM Address area
Chip No
from
to
from
to
from
to
from
to
Address Area
Computer Architecture
145
Design Procedure of Memory Module
Design Procedure 1. Decision of memory capacity for computer
system2. Chip decision and design of address map3. Circuit design in detail
Computer Architecture
146
• Capacity : 1K bytes RAM, 512 bytes ROM • Address: RAM = 0 ~, ROM = 800H ~• Useful chips: 256×8 bits RAM, 512×8 bits ROM
o Address table
Memory Design for 8-bit Micro Computer
Memory ChipAddress Area(Hexadecimal)
Address bits
Computer Architecture
147
Design Example of Memory Device for 8-bit Micro Computer
(8-bit)Address
Decoder3 2 1 0
Data
Computer Architecture
148
Cache Memory[Wikipedia definition] A cache is a component that improves
performance by transparently storing data such that future requests for that data can be served faster
Purpose for use: high-speed memory which is installed
between CPU and memory to minimize the CPU waiting time because of the speed difference between CPU and memory.
Characteristics Use of memory chips which have a higher
access speed than that of main memory Small capacity because of the price and
limited space
CPU
Main Memory
Cache
Computer Architecture
149
Cache Memorycache hit : data which CPU wants to access already exists in cachecache miss : data which CPU wants to access doesn’t exist in cacheCache hit ratio(H) :
The ratio(or percentage) of accesses that result in cache hits is known as the hit ratio of the cache
number of times to be hit to cache H = -------------------------------------- number of times of total memory access
Cache miss ratio = (1 - H)Average access time of memory device (Ta) :
Ta = H × Tc + (1 - H) × Tm
Tc: cache access time, Tm: main memory access time
Computer Architecture
Computer Architecture: Cache Memory
Lecture #12
150Computer Architecture
So you want fast?
It is possible to build a computer which uses only static RAM (see later).
This would be very fast.
This would need no cache.—How can you cache cache?
This would cost a very large amount.
151Computer Architecture
Locality of Reference
During the course of the execution of a program, memory references tend to cluster.
e.g. loops
152Computer Architecture
CacheSmall amount of fast memorySits between normal main memory and CPUMay be located on CPU chip or module
CPU Cache Main Memory
Word Transfer Block Transfer
153Computer Architecture
Cache operation - overviewCPU requests contents of memory location.Check cache for this data.If present, get from cache (fast).If not present, read required block from main memory to cache.Then deliver from cache to CPU.Cache includes tags to identify which block of main memory is in each cache slot.
154Computer Architecture
Size does matter
Cost—More cache is expensive.
Speed—More cache is faster (up to a point).—Checking cache for data takes time.
155Computer Architecture
Typical Cache Organization
156Computer Architecture
Mapping Function
Cache of 64kByte
Cache block of 4 bytes—i.e. cache is 16k (214) lines of 4 bytes
16MBytes main memory
24 bit address —(224=16M)
157Computer Architecture
Direct MappingEach block of main memory maps to only one cache line.—i.e. if a block is in cache, it must be in one specific
place
Address is in two parts.Least Significant w bits identify unique word.Most Significant s bits specify one memory block.
The MSBs are split into a cache line field r and a tag of s-r (most significant).
158Computer Architecture
Direct Mapping-Address Structure
Tag Field (t) Slot Field (s) Word Field(w)
8 14 2
• 24 bit address• 2 bit word identifier (4 byte block)• 22 bit block identifier
—8 bit tag (=22-14)—14 bit slot or line
• No two blocks in the same line have the same Tag field.• Check contents of cache by finding line and checking
Tag.
159
Direct Mapping - Cache Slot Table
Cache Slot Main Memory blocks held0 0, m, 2m, 3m…2s-m1 1,m+1, 2m+1…2s-m+1
m-1 m-1, 2m-1,3m-1…2s-1
160Computer Architecture
Direct Mapping Cache Organization
Memory Address
Comparator
Cache
Data
Tag
Slot(0)
Slot(i)
Slot(m-1)
Tag Slot Word
(Cache hit)
(Cache miss)
Main Memory
161Computer Architecture
Direct Mapping SummaryAddress length = (t+ s + w) bits
Number of addressable units = 2s+w words or bytes
Block size = 2w words or bytes
Number of blocks in main memory = 2t+s+w/2w = 2t+s
Number of slots in cache = m = 2s
Size of tag = t bits
162Computer Architecture
Direct Mapping Characteristics
Simple
Inexpensive
Fixed location for given block—If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high.
163Computer Architecture
Associative Mapping
A main memory block can load into any line of cache.
Memory address is interpreted as tag and wordTag uniquely identifies block of memory.
Every line’s tag is examined for a match.
Cache searching gets expensive.
164Computer Architecture
Fully Associative Cache Organization
Tag Field Word Field
Memory Address
Comparator
Cache
DataTag
Slot(0)
Slot(i)
Slot(m-1)
Tag Word
(Cache hit)
(Cache miss)
Main Memory
165Computer Architecture
Associative Mapping ExampleAddress
Tag Word Data
Main Memory(128 bytes)
5 bits 32bitsCache(32 bytes)
Tag data slot #
166Computer Architecture
Tag 5 bit Word2 bit
Associative Mapping-Address Structure
5 bit tag stored with each 32 bit block of dataCompare tag field with tag entry in cache to check for hitLeast significant 2 bits of address identify which 16 bit word is required from 32 bit data block
167Computer Architecture
Associative Mapping SummaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesSlot size = 2w words or bytesNumber of tags in main memory = 2t+ w/2w = 2t
Number of slots in cache = undeterminedSize of tag = t bits
168Computer Architecture
Set Associative Mapping
Cache is divided into a number of sets.Each set contains a number of lines.A given block maps to any line in a given set.—e.g. Block B can be in any line of set i.
e.g. 2 lines per set—2 way associative mapping—A given block can be in one of 2 lines in only one set.
Tag Field Set Field Word Field
169Computer Architecture
Set Associative Mapping Example
Tag Set Word
23 2
Memory Address
Comparator
Cache
DataTag
Slot(0)
Slot(1)
Set(i)
Set(0)Tag Set Word
(Cache hit)
(Cache miss)
Main Memory
Slot(0)
Slot(1)
Slot(0)
Slot(1)
Set(m-1)
170Computer Architecture
Set Associative Mapping -Address Structure
Use set field to determine cache set to look in.Compare tag field to see if we have a hit.e.g—Address Tag Data Set number—1FF 7FFC 1FF 12345678 1FFF—001 7FFC 001 11223344 1FFF
Tag 9 bit Set 13 bitWord2 bit
171Computer Architecture
Set Associative Mapping SummaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytesNumber of blocks in main memory = 2d
Number of lines in set = kNumber of sets = v = 2d
Number of lines in cache = kv = k * 2d
Size of tag = (s – d) bits
172Computer Architecture
Pentium 4 Cache80386 – no on chip cache80486 – 8k using 16 byte lines and four way set associative organizationPentium (all versions) – two on chip L1 caches— Data & instructions
Pentium 4 – L1 caches— 8k bytes— 64 byte lines— four way set associative
L2 cache — Feeding both L1 caches— 256k— 128 byte lines— 8 way set associative
173Computer Architecture
Pentium 4 Core ProcessorFetch/Decode Unit— Fetches instructions from L2 cache— Decode into micro-ops— Store micro-ops in L1 cache
Out of order execution logic— Schedules micro-ops— Based on data dependence and resources— May speculatively execute
Execution units— Execute micro-ops— Data from L1 cache— Results in registers
Memory subsystem— L2 cache and systems bus
174Computer Architecture
Pentium 4 DesignDecodes instructions into RISC like micro-ops before L1 cache
Micro-ops fixed length— Superscalar pipelining and scheduling
Pentium instructions long & complex
Performance improved by separating decoding from scheduling & pipelining— (More later – ch14)
Data cache is write back— Can be configured to write through
L1 cache controlled by 2 bits in register— CD = cache disable— NW = not write through— 2 instructions to invalidate (flush) cache and write back then invalidate
175Computer Architecture
DRAM
1. Synchronous DRAM (SDRAM)— Add a clock signal to DRAM interface, so that the
repeated transfers would not bear overhead to synchronize with DRAM controller
2. Double Data Rate (DDR SDRAM)— Transfer data on both the rising edge and falling edge of
the DRAM clock signal doubling the peak data rate— DDR2 lowers power by dropping the voltage from 2.5 to
1.8 volts + offers higher clock rates: up to 400 MHz— DDR3 drops to 1.5 volts + higher clock rates: up to 800
MHz
Improved Bandwidth, not Latency
176Computer Architecture
DRAM
StandardClock Rate
(MHz)M transfers /
second DRAM NameMbytes/s/
DIMMDIMM
Name
DDR 133 266 DDR266 2128 PC2100
DDR 150 300 DDR300 2400 PC2400
DDR 200 400 DDR400 3200 PC3200
DDR2 266 533 DDR2-533 4264 PC4300
DDR2 333 667 DDR2-667 5336 PC5300
DDR2 400 800 DDR2-800 6400 PC6400
DDR3 533 1066 DDR3-1066 8528 PC8500
DDR3 666 1333 DDR3-1333 10664 PC10700
DDR3 800 1600 DDR3-1600 12800 PC12800
x 2 x 8177Computer Architecture
Error Correction
Motivation:—Failures/time proportional to number of bits!—As DRAM cells shrink, more vulnerable
Went through period in which failure rate was low enough without error correction that people didn’t do correction—DRAM banks too large now—Servers always corrected memory systems
Basic idea: add redundancy through parity bits—Common configuration: Random error correction
– SEC-DED (single error correct, double error detect)– One example: 64 data bits + 8 parity bits (11% overhead)
—Really want to handle failures of physical components as well– Organization is multiple DRAMs/DIMM, multiple DIMMs– Want to recover from failed DRAM and failed DIMM!– “Chip kill” handle failures width of single DRAM chip
178Computer Architecture