University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell
CS352H: Computer Systems Architecture
Lecture 3: Instruction Set Architectures II
September 3, 2009
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 2
ISA is a Contract
Between the programmer and the hardware:Defines visible state of the systemDefines how the state changes in response to instructions
Programmer obtains a model of how programs will executeHardware designer obtains a formal definition of thecorrect way to execute instructionsISA Specification:
Instruction setHow instructions modify the state of the machineBinary representation
Today:ISA principlesISA evolution
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 3
ISA Specification
Machine stateMemory organizationRegister organization
Instruction formatsInstruction typesAddressing modes
Data typesOperationsInterrupts/Events
InstructionRepresentation
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 4
Architecture vs. Implementation
Architecture defines what a computer system does inresponse to an instruction and dataArchitectural components are visible to the programmer
Implementation defines how a computer system does itSequence of stepsTime (cycles)Bookkeeping functions
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 5
Architecture or Implementation?
Number of GP registersWidth of memory wordWidth of memory busBinary representation of:add r3, r3, r9
# of cycles to execute a FP instructionSize of the instruction cacheHow condition codes are set on an ALU overflow
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 6
Machine State
Registers (size & type)PCAccumulatorsIndexGeneral purposeControl
MemoryVisible hierarchy (if any)Addressability
Bit, byte, wordEndian-nessMaximum size
Protection
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 7
Components of Instructions
Operations (opcodes)Number of operandsOperand specifiers (names)
Can be implicit
Instruction classesALUBranchMemory…
Instruction encodings
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 8
Number of Operands
None haltnop
One not R4 R4 ~R4
Two add R1, R2 R1 R1+R2
Three add R1, R2, R3 R1 R2+R3
> three madd R4,R1,R2,R3 R4 R1+(R2*R3)
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 9
Effect of Number of Operands
GivenE = (C + D) * (C – D)
And C, D and E in R1, R2 and R3 (resp.)
3 operand machine 2 operand machineadd R3, R1, R2 mov R3, R1sub R4, R1, R2 add R3, R2mult R3, R4, R3 sub R2, R1
mult R3, R2
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 10
Evolution of Register Organization
In the beginning…The accumulator
Two instruction types: op &store
A A op MA A op *M*M A
One address architectureOne memory address perinstruction
Two addressing modes:Immediate: MDirect: *M
Inspired by “tabulating”machines
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 11
The Index Register
Add indexed addressing modeA A op (M+I)A A op *(M+I)*(M+I) A
Useful for array processingAddr. of X[0] in instructionIndex value in index register
One register per function:PC: instructionsI: data addressesA: data values
Need new instruction to use Iinc Icmp I
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 12
Example of Effect of Index Register
Without Index RegisterStart: CLR i
CLR sumLoop: LOAD IX
AND #MASKOR iSTORE IXLOAD sum
IX: ADD ySTORE sumLOAD iADD #1STORE iCMP nBNE Loop
With Index RegisterStart: CLRA
CLRXLoop: ADDA y(X)
INCXCMPX nBNE Loop
Sum = 0;for (i=0; i<n; i++) sum = sum + y[i];
But what about…
Sum = 0;for (i=0; i<n; i++)
for (j=0; j<n; j++)sum = sum + x[j] * y[i];
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 13
1964: General-Purpose Registers
Merge accumulators (data) &index registers (addresses)
SimplerMore orthogonal (opcodeindependent of register)More fast local storageBut addresses and data must bethe same size
How many registers?More: fewer loadsBut more instruction bits
IBM 360
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 14
Stack Machines
Register state: PC & SPAll instructions performed onTOS & SOSImplied stack Push & Pop
TOS TOS op SOSTOS TOS op MTOS TOS op *M
Many instructions are zeroaddress!Stack cache for performance
Like a register fileManaged by hardware
Pioneered by Burroughs inearly 60’sRenaissance due to JVM
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 15
Register-Based ISAs
Why do register-based architectures dominate the market?Registers are faster than memoryCan “cache” variables
Reduces memory trafficImproves code density
More efficient use by compiler than other internal storage (stack)(A*B) – (B*C) – (A*D)
What happened to Register-Memory architectures?More difficult for compilerRegister-Register architectures more amenable to fastimplementation
General- versus special-purpose registers?Special-purpose examples in MIPS: PC, Hi, LoCompiler wants an egalitarian register society
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 16
Stack Code Examples
push Dpush Cmulpush Baddpush Jpushx Fpush Caddaddpop E
Pure stack(zero addresses)11 instr, 7 addr
load R1, Dload R2, Cmul R3, R2, R1load R4, Badd R5, R4, R3load R6, Jload R7, F(R6)add R8, R7, R2add R9, R5, R8store R9, E
Load/Store Arch(Several GP registers)10 instr, 6 addr
A = B + C * D;E = A + F[J] + C;
push Dmul Cadd Bpush Jpushx Fadd Caddpop E
One address stack
8 instr, 7 addr
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 17
Memory Organization
ISA specifies five aspects ofmemory:
Smallest addressable unitMaximum addressable units ofmemoryAlignmentEndian-nessAddress modes
Little Endian: Intel, DEC
Big Endian: IBM, Motorola
Today: Configurable
Bytes: any addressHalf words: even addressesWords: Multiples of 4
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 18
Addressing Modes are Driven by Program Usage
double x[100]; // globalvoid foo(int a) { // argument int j; // local for (j=0; j<10; j++) x[j] = 3 + a*x[j-1];
bar(a);}
array reference
constant
procedure
argument
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 19
Addressing Mode Types
#n immediate(0x1000) absolute (aka direct)Rn register(Rn) register indirect-(Rn) predecrement(Rn)+ postincrement*(Rn) memory indirect*(Rn)+ postincrement indirectd(Rn) displacementd(Rn)[Rx] scaled
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 20
Why Only three Addressing Modes in MIPS?
Studies of code generated for GP computers:Register mode: ~50%Immediate + Displacement: 35% - 40%The Vax had 27 addressing modes!
But special-purpose ISAs make more extensive use ofother modes
Auto-increment in DSP processing
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 21
How Many Bits for Displacement?
Depends on storage organization & compiler!DEC Alpha data
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 22
How Many Bits for Immediates?
Same DEC Alpha study as displacement data A study of the Vax (with support for 32-bit immediates) showed that 20% to
25% of immediate values required more than 16 bits
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 23
Data Types
How the contents of memory & registers are interpretedCan be identified by
TagUse
Driven by application:Signal processing: 16-bit fixed point (fractions)Text processing: 8-bit charactersScientific processing: 64-bit floating point
GP computers:8, 16, 32, 64-bitSigned & unsignedFixed & floating
Symbolics tags
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 24
Example: 32-bit Floating Point
Specifies mapping from bits to real numbersFormat
Sign bit (S)8-bit exponent (E)23- bit mantissa (M)
InterpretationValue = (-1)S * 2(E-127) * 1.M
Operations:Add, sub, mult, div, sqrt
“Integer” operations can also have fractionsAssume the binary point is just to the right of the leftmost bit0100 1000 0000 1000 = 2-1 + 2-4 + 2-12 = 0.56274
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 25
Instruction Types
ALUArithmetic (add, sub, mult, …)Logical (and, or, srl, …)Data type conversions (cvtf2i, …)Fused memory/arithmetic
Data movementMemory reference (lw, sb, …)Register to register (movi2fp, …)
ControlTest/compare (slt, …)Branch, jump (beq, j, jr, …)Procedure call (jal, …)OS entry (trap)
ComplexString compare, procedure call (with save/restore), …
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 26
Control Instructions
ImplicitPC PC + 4
Unconditional jumpsPC X (direct)PC PC + X (PC relative)
X can be a constant or a register
Conditional jumps (branches): > 75% of control instr.PC PC + ((cond) ? X : 4)
Procedure call/returnPredicated instructionsConditions
FlagsIn a registerFused compare and branch
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 27
Methods for Conditional Jumps
Condition codesTests special bits set by ALU
Sometimes this is done for freeCC is extra state constraining instruction order
X86, ARM, PowerPCCondition register
Tests arbitrary register for result of comparisonSimpleBut uses up a register
Alpha, MIPSFused compare and branch
Comparison is part of the branchSingle instructionComplicates pipelining
PA-RISC, VAX, MIPS
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 28
Long Branches
beq $7, $8, LabelWhat if Label is “far away”?
PC-relative address cannot be encoded in 16 bits
Transform to:bne $7, $8, NearbyLabelj FarAwayLabel
NearbyLabel:
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 29
Predication
Branches introducediscontinuitiesIf (condition) then
thiselse
thatMight translate into
R11 (condition)beq R11, R0, L1thisj L2
L1: thatL2:Forced to wait for “beq”
With predication both this andthat are evaluated but only theresults of the “correct” path arekept(condition) this(not condition) thatNeed
Predicated instructionsPredicate registersCompiler
IA-6464 1-bit predicate registersInstructions include extra bitsfor predicates
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 30
Exceptions/Events
Implied multi-way branch afterevery instruction
External events (interrupts)I/O completion
Internal eventsArithmetic overflowPage fault
What happens?EPC PC of instructioncausing faultPC HW table lookup (basedon fault)Return to EPC + 4 (sort of)
What about complex “lengthy”instructions?
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 31
How many bits for the branch displacement?
Procedure call/returnShould saving and restoring of registers be done automatically?Vax callp instruction
Control Instructions: Miscellaneous
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 32
Instruction Formats
Need to specify all kinds ofinformation
R3 R1 + R2Jump to addressReturn from call
Frequency variesInstructionsOperand types
Possible encodings:Fixed lengthFew lengthsByte/bit variable
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 33
Variable-Length Instructions
More efficient encodingsNo unused fields/operandsCan use frequencies whendetermining opcode, operand& address mode encodings
ExamplesVAXIntel x86 (byte variable)Intel 432 (bit variable)
At a cost of complicating fastimplementation
Where is the next instruction?Sequential operand locationdetermination
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 34
Compromise: A Couple of Lengths
Better code density than fixedlength
An issue for embeddedprocessors
Simpler to decode thanvariable-lengthExamples:
ARM ThumbMIPS 16
Another approachOn-the-fly instructiondecompression (IBMCodePack)
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 35
Next Lecture
Finish ISA PrinciplesA brief look at the IA-32 ISARISC vs. CISCThe MIPS ALUHwk #1 due