Date post: | 05-Jan-2016 |
Category: |
Documents |
Upload: | emerald-gordon |
View: | 220 times |
Download: | 4 times |
13Reduced Instruction Set Computers
Computer Organization
Major Advances in Computers(1)The family concept
IBM System/360 1964DEC PDP-8Separates architecture from implementation
Microprogrammed control unitIdea by Wilkes 1951Produced by IBM S/360 1964
Cache memoryIBM S/360 model 85 1969
Major Advances in Computers(2)Solid State RAM
(See memory notes)Microprocessors
Intel 4004 1971Pipelining
Introduces parallelism into fetch execute cycle
Multiple processors
The Next Step - RISCReduced Instruction Set Computer
Key featuresLarge number of general purpose registersor use of compiler technology to optimize
register useLimited and simple instruction setEmphasis on optimising the instruction
pipeline
Comparison of processors
Driving force for CISCSoftware costs far exceed hardware costsIncreasingly complex high level languagesSemantic gapLeads to:
Large instruction setsMore addressing modesHardware implementations of HLL
statementse.g. CASE (switch) on VAX
Intention of CISCEase compiler writingImprove execution efficiency
Complex operations in microcodeSupport more complex HLLs
Execution CharacteristicsOperations performedOperands usedExecution sequencingStudies have been done based on
programs written in HLLsDynamic studies are measured during the
execution of the program
OperationsAssignments
Movement of dataConditional statements (IF, LOOP)
Sequence controlProcedure call-return is very time
consumingSome HLL instruction lead to many
machine code operations
Weighted Relative Dynamic Frequency of HLL Operations [PATT82a]
Dynamic Occurrence
Machine-Instruction Weighted
Memory-Reference Weighted
Pascal C Pascal C Pascal C
ASSIGN 45% 38% 13% 13% 14% 15%
LOOP 5% 3% 42% 32% 33% 26%
CALL 15% 12% 31% 33% 44% 45%
IF 29% 43% 11% 21% 7% 13%
GOTO — 3% — — — —
OTHER 6% 1% 3% 1% 2% 1%
OperandsMainly local scalar variablesOptimisation should concentrate on
accessing local variables
Pascal C Average
Integer Constant 16% 23% 20%
Scalar Variable 58% 53% 55%
Array/Structure 26% 24% 25%
Procedure CallsVery time consumingDepends on number of parameters passedDepends on level of nestingMost programs do not do a lot of calls
followed by lots of returnsMost variables are local(c.f. locality of reference)
ImplicationsBest support is given by optimising most
used and most time consuming featuresLarge number of registers
Operand referencingCareful design of pipelines
Branch prediction etc.Simplified (reduced) instruction set
Large Register FileSoftware solution
Require compiler to allocate registersAllocate based on most used variables in a
given timeRequires sophisticated program analysis
Hardware solutionHave more registersThus more variables will be in registers
Registers for Local VariablesStore local scalar variables in registersReduces memory accessEvery procedure (function) call changes
localityParameters must be passedResults must be returnedVariables from calling programs must be
restored
Register WindowsOnly few parametersLimited range of depth of callUse multiple small sets of registersCalls switch to a different set of registersReturns switch back to a previously used
set of registers
Register Windows cont.Three areas within a register set
Parameter registersLocal registersTemporary registersTemporary registers from one set overlap
parameter registers from the nextThis allows parameter passing without
moving data
Overlapping Register Windows
Circular Buffer diagram
Operation of Circular BufferWhen a call is made, a current window
pointer is moved to show the currently active register window
If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory
A saved window pointer indicates where the next saved windows should restore to
Global VariablesAllocated by the compiler to memory
Inefficient for frequently accessed variablesHave a set of registers for global variables
Registers v CacheLarge Register File Cache
All local scalars Recently-used local scalars
Individual variables Blocks of memory
Compiler-assigned global variables
Recently-used global variables
Save/Restore based on procedure nesting depth
Save/Restore based on cache replacement algorithm
Register addressing Memory addressing
Referencing a Scalar - Window Based Register File
Referencing a Scalar - Cache
Compiler Based Register OptimizationAssume small number of registers (16-32)Optimizing use is up to compilerHLL programs have no explicit references to
registersusually - think about C - register int
Assign symbolic or virtual register to each candidate variable
Map (unlimited) symbolic registers to real registers
Symbolic registers that do not overlap can share real registers
If you run out of real registers some variables use memory
Graph ColoringGiven a graph of nodes and edgesAssign a color to each nodeAdjacent nodes have different colorsUse minimum number of colorsNodes are symbolic registersTwo registers that are live in the same
program fragment are joined by an edgeTry to color the graph with n colors, where n is the number of real registers
Nodes that can not be colored are placed in memory
Graph Coloring Approach
Why CISC (1)?Compiler simplification?
Disputed…Complex machine instructions harder to
exploitOptimization more difficult
Smaller programs?Program takes up less memory but…Memory is now cheapMay not occupy less bits, just look shorter in
symbolic formMore instructions require longer op-codesRegister references require fewer bits
Why CISC (2)?Faster programs?
Bias towards use of simpler instructionsMore complex control unitMicroprogram control store largerthus simple instructions take longer to
execute
It is far from clear that CISC is the appropriate solution
RISC CharacteristicsOne instruction per cycleRegister to register operationsFew, simple addressing modesFew, simple instruction formatsHardwired design (no microcode)Fixed instruction formatMore compile time/effort
RISC v CISCNot clear cutMany designs borrow from both
philosophiese.g. PowerPC and Pentium II
RISC PipeliningMost instructions are register to registerTwo phases of execution
I: Instruction fetchE: Execute
ALU operation with register input and outputFor load and store
I: Instruction fetchE: Execute
Calculate memory addressD: Memory
Register to memory or memory to register operation
Effects of Pipelining
Optimization of PipeliningDelayed branch
Does not take effect until after execution of following instruction
This following instruction is the delay slot
Normal and Delayed BranchAddress Normal Branch Delayed Branch Optimized
Delayed Branch
100 LOAD X, rA LOAD X, rA LOAD X, rA
101 ADD 1, rA ADD 1, rA JUMP 105
102 JUMP 105 JUMP 106 ADD 1, rA
103 ADD rA, rB NOOP ADD rA, rB
104 SUB rC, rB ADD rA, rB SUB rC, rB
105 STORE rA, Z SUB rC, rB STORE rA, Z
106 STORE rA, Z
Use of Delayed Branch
ControversyQuantitative
compare program sizes and execution speedsQualitative
examine issues of high level language support and use of VLSI real estate
ProblemsNo pair of RISC and CISC that are directly
comparableNo definitive set of test programsDifficult to separate hardware effects from
complier effectsMost comparisons done on “toy” rather than
production machinesMost commercial devices are a mixture