13 Reduced Instruction Set Computers Computer Organization.

13Reduced Instruction Set Computers

Computer Organization

Major Advances in Computers(1)The family concept

IBM System/360 1964DEC PDP-8Separates architecture from implementation

Microprogrammed control unitIdea by Wilkes 1951Produced by IBM S/360 1964

Cache memoryIBM S/360 model 85 1969

Major Advances in Computers(2)Solid State RAM

(See memory notes)Microprocessors

Intel 4004 1971Pipelining

Introduces parallelism into fetch execute cycle

Multiple processors

The Next Step - RISCReduced Instruction Set Computer

Key featuresLarge number of general purpose registersor use of compiler technology to optimize

register useLimited and simple instruction setEmphasis on optimising the instruction

pipeline

Comparison of processors

Driving force for CISCSoftware costs far exceed hardware costsIncreasingly complex high level languagesSemantic gapLeads to:

Large instruction setsMore addressing modesHardware implementations of HLL

statementse.g. CASE (switch) on VAX

Intention of CISCEase compiler writingImprove execution efficiency

Complex operations in microcodeSupport more complex HLLs

Execution CharacteristicsOperations performedOperands usedExecution sequencingStudies have been done based on

programs written in HLLsDynamic studies are measured during the

execution of the program

OperationsAssignments

Movement of dataConditional statements (IF, LOOP)

Sequence controlProcedure call-return is very time

consumingSome HLL instruction lead to many

machine code operations

Weighted Relative Dynamic Frequency of HLL Operations [PATT82a]

Dynamic Occurrence

Machine-Instruction Weighted

Memory-Reference Weighted

Pascal C Pascal C Pascal C

ASSIGN 45% 38% 13% 13% 14% 15%

LOOP 5% 3% 42% 32% 33% 26%

CALL 15% 12% 31% 33% 44% 45%

IF 29% 43% 11% 21% 7% 13%

GOTO — 3% — — — —

OTHER 6% 1% 3% 1% 2% 1%

OperandsMainly local scalar variablesOptimisation should concentrate on

accessing local variables

Pascal C Average

Integer Constant 16% 23% 20%

Scalar Variable 58% 53% 55%

Array/Structure 26% 24% 25%

Procedure CallsVery time consumingDepends on number of parameters passedDepends on level of nestingMost programs do not do a lot of calls

followed by lots of returnsMost variables are local(c.f. locality of reference)

ImplicationsBest support is given by optimising most

used and most time consuming featuresLarge number of registers

Operand referencingCareful design of pipelines

Branch prediction etc.Simplified (reduced) instruction set

Large Register FileSoftware solution

Require compiler to allocate registersAllocate based on most used variables in a

given timeRequires sophisticated program analysis

Hardware solutionHave more registersThus more variables will be in registers

Registers for Local VariablesStore local scalar variables in registersReduces memory accessEvery procedure (function) call changes

localityParameters must be passedResults must be returnedVariables from calling programs must be

restored

Register WindowsOnly few parametersLimited range of depth of callUse multiple small sets of registersCalls switch to a different set of registersReturns switch back to a previously used

set of registers

Register Windows cont.Three areas within a register set

Parameter registersLocal registersTemporary registersTemporary registers from one set overlap

parameter registers from the nextThis allows parameter passing without

moving data

Overlapping Register Windows

Circular Buffer diagram

Operation of Circular BufferWhen a call is made, a current window

pointer is moved to show the currently active register window

If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory

A saved window pointer indicates where the next saved windows should restore to

Global VariablesAllocated by the compiler to memory

Inefficient for frequently accessed variablesHave a set of registers for global variables

Registers v CacheLarge Register File Cache

All local scalars Recently-used local scalars

Individual variables Blocks of memory

Compiler-assigned global variables

Recently-used global variables

Save/Restore based on procedure nesting depth

Save/Restore based on cache replacement algorithm

Register addressing Memory addressing

Referencing a Scalar - Window Based Register File

Referencing a Scalar - Cache

Compiler Based Register OptimizationAssume small number of registers (16-32)Optimizing use is up to compilerHLL programs have no explicit references to

registersusually - think about C - register int

Assign symbolic or virtual register to each candidate variable

Map (unlimited) symbolic registers to real registers

Symbolic registers that do not overlap can share real registers

If you run out of real registers some variables use memory

Graph ColoringGiven a graph of nodes and edgesAssign a color to each nodeAdjacent nodes have different colorsUse minimum number of colorsNodes are symbolic registersTwo registers that are live in the same

program fragment are joined by an edgeTry to color the graph with n colors, where n is the number of real registers

Nodes that can not be colored are placed in memory

Graph Coloring Approach

Why CISC (1)?Compiler simplification?

Disputed…Complex machine instructions harder to

exploitOptimization more difficult

Smaller programs?Program takes up less memory but…Memory is now cheapMay not occupy less bits, just look shorter in

symbolic formMore instructions require longer op-codesRegister references require fewer bits

Why CISC (2)?Faster programs?

Bias towards use of simpler instructionsMore complex control unitMicroprogram control store largerthus simple instructions take longer to

execute

It is far from clear that CISC is the appropriate solution

RISC CharacteristicsOne instruction per cycleRegister to register operationsFew, simple addressing modesFew, simple instruction formatsHardwired design (no microcode)Fixed instruction formatMore compile time/effort

RISC v CISCNot clear cutMany designs borrow from both

philosophiese.g. PowerPC and Pentium II

RISC PipeliningMost instructions are register to registerTwo phases of execution

I: Instruction fetchE: Execute

ALU operation with register input and outputFor load and store

I: Instruction fetchE: Execute

Calculate memory addressD: Memory

Register to memory or memory to register operation

Effects of Pipelining

Optimization of PipeliningDelayed branch

Does not take effect until after execution of following instruction

This following instruction is the delay slot

Normal and Delayed BranchAddress Normal Branch Delayed Branch Optimized

Delayed Branch

100 LOAD X, rA LOAD X, rA LOAD X, rA

101 ADD 1, rA ADD 1, rA JUMP 105

102 JUMP 105 JUMP 106 ADD 1, rA

103 ADD rA, rB NOOP ADD rA, rB

104 SUB rC, rB ADD rA, rB SUB rC, rB

105 STORE rA, Z SUB rC, rB STORE rA, Z

106 STORE rA, Z

Use of Delayed Branch

ControversyQuantitative

compare program sizes and execution speedsQualitative

examine issues of high level language support and use of VLSI real estate

ProblemsNo pair of RISC and CISC that are directly

comparableNo definitive set of test programsDifficult to separate hardware effects from

complier effectsMost comparisons done on “toy” rather than

production machinesMost commercial devices are a mixture

Date post:	05-Jan-2016
Category:	Documents
Upload:	emerald-gordon
View:	220 times
Download:	4 times

13 Reduced Instruction Set Computers Computer Organization.

Documents