Computer Architecture: A Constructive Approach
Multi-cycle SMIPS Implementations
Joel EmerComputer Science & Artificial Intelligence Lab.Massachusetts Institute of Technology
March 5, 2012 L8-1http://csg.csail.mit.edu/
6.S078
Harvard-Style Datapath for MIPS
0x4
RegWrite
AddAdd
clk
WBSrcMemWrite
addr
wdata
rdataData Memory
we
RegDst BSrcExtSelOpCode
z
OpSel
clk
zero?
clk
addrinst
Inst.Memory
PC rd1
GPRs
rs1rs2
wswd rd2
we
ImmExt
ALU
ALUControl
31
PCSrcbrrindjabspc+4
old way
March 5, 2012 L8-2http://csg.csail.mit.edu/
6.S078
Hardwired Control TableOpcode ExtSel BSrc OpSel MemW RegW WBSrc RegDst PCSrcALUALUiALUiuLWSWBEQZz=0
BEQZz=1
JJALJRJALR
BSrc = Reg / Imm WBSrc = ALU / Mem / PC RegDst = rt / rd / R31 PCSrc = pc+4 / br / rind / jabs
* * * no yes rindPC R31rind* * * no no * *jabs* * * no yes PC R31
jabs* * * no no * *pc+4sExt16 * 0? no no * *
brsExt16 * 0? no no * *pc+4sExt16 Imm + yes no * *
pc+4Imm Op no yes ALU rt
pc+4* Reg Func no yes ALU rdsExt16 Imm Op pc+4no yes ALU rt
pc+4sExt16 Imm + no yes Mem rtuExt16
old way
March 5, 2012 L8-3http://csg.csail.mit.edu/
6.S078
Single-Cycle SMIPS
PC
InstMemory
Decode
Register File
Execute
DataMemory
+4
Datapath and control were derived automatically from a high-level rule-based description
new way
2 read & 1 write ports
separate Instruction &
Data memories
March 5, 2012 L8-4http://csg.csail.mit.edu/
6.S078
Single-Cycle SMIPS code structuremodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport ; let dMem = mem.dport;
rule doProc; let inst = iMem(MemReq{op:Ld, addr:pc, data:?}); let dInst = decode(inst); Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, pc);
update rf, pc and dMem
March 5, 2012 L8-5http://csg.csail.mit.edu/
6.S078
decode
Decoding Instructions: input-output types
instructionbrComp
rDstrSrc1rSrc2immext
aluFunc
iType31:26, 5:0
31:265:0
31:26
20:1615:1125:21
20:16
15:025:0
Type DecodedInst
Bit#(32)
IType
AluFunc
RindexRindexRindex
BrType
Bit#(32)
Mux control logic not shown
immValidBoolMarch 5, 2012 L8-6
http://csg.csail.mit.edu/6.S078
Reading RegistersRead registers
RVal2RVal1
RFRSrc2
Pure combinational logic
RSrc1
March 5, 2012 L8-7http://csg.csail.mit.edu/
6.S078
Executing Instructionsexecute
dInst
addr
brTaken
data
iTyperDst
ALU
BranchAddresspc
rVal2
rVal1
Pure combinational logic
either for memory reference or branch target
either for rf write or St
ALUBr
March 5, 2012 L8-8http://csg.csail.mit.edu/
6.S078
Branch Address Calculationfunction Addr brAddrCalc(Address pc, Data val, IType iType, Data imm); let targetAddr = case (iType) J, Jal : {pc[31:28], imm[27:0]}; Jr, Jalr : val; default : pc + imm; endcase; return targetAddr;endfunction
March 5, 2012 L8-9http://csg.csail.mit.edu/
6.S078
Some Useful Functionsfunction Bool memType (IType i) return (i==Ld || i == St);endfunction
function Bool regWriteType (IType i) return (i==Alu || i==Ld || i==Jal || i==Jalr);endfunction
function Bool controlType (IType i) return (i==J || i==Jr || i==Jal || i==Jalr || i==Br);endfunction
March 5, 2012 L8-10http://csg.csail.mit.edu/
6.S078
Execute Functionfunction ExecInst exec(DecodedInst dInst, Data rVal1, Data rVal2, Addr pc); ExecInst einst = ?;
Data aluVal2 = (dInst.immValid)? dInst.imm : rVal2 let aluRes = alu(rVal1, aluVal2, dInst.aluFunc); let brAddr = brAddrCal(pc, rVal1, dInst.iType, dInst.imm); einst.itype = dInst.iType; einst.addr = (memType(dInst.iType)? aluRes : brAddr; einst.data = dInst.iType==St ? rVal2 : aluRes; einst.brTaken = aluBr(rVal1, aluVal2, dInst.brComp); einst.rDst = dInst.rDst; return einst;endfunction
March 5, 2012 L8-11http://csg.csail.mit.edu/
6.S078
Single-Cycle SMIPS atomic state updates if(memType(eInst.iType)) eInst.data <- dMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data});
if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data);
pc <= eInst.brTaken ? eInst.addr : pc + 4;
endrule endmodule
March 5, 2012 L8-12http://csg.csail.mit.edu/
6.S078
Single-Cycle SMIPS: Clock Speed
PC
InstMemory
Decode
Register File
Execute
DataMemory
+4
tClock > tM + tDEC + tRF + tALU+ tM+ tWB
We can improve the clock speed if we execute each instruction in two clock cyclestClock > max {tM , (tDEC + tRF + tALU+ tM+ tWB
)}
March 5, 2012 L8-13http://csg.csail.mit.edu/
6.S078
Two-Cycle SMIPS
PC
InstMemory
Decode
Register File
Execute
DataMemory
+4 ir
stage
Introduce register “ir” to hold a fetched instruction and register “stage” to remember which stage (fetch/execute) we are in
March 5, 2012 L8-14http://csg.csail.mit.edu/
6.S078
ir: The instruction registerYou may recall from our earlier discussion of pipelining that when we take multiple cycles to perform some operation (e.g., IFFT), there is a possibility that intermediate registers do not contain any meaningful data in some cyclesIt is straight forward to convert ir into a pipeline register
We can associate (Valid/Invalid) bit with ir Equivalently, we can think of ir as a single-element
FIFO
March 5, 2012 L8-15http://csg.csail.mit.edu/
6.S078
Additional Typestypedef struct { Addr pc; Bit#(32) inst;} TypeFetch2Decode deriving (Bits, Eq);
typedef enum {Fetch, Execute} TypeStage deriving (Bits, Eq);
March 5, 2012 L8-16http://csg.csail.mit.edu/
6.S078
Two-Cycle SMIPSmodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport; let dMem = mem.dport; Reg#(TypeFetch2Decode) ir <- mkRegU; Reg#(TypeStage) stage <- mkReg(Fetch);
rule doFetch (state==Fetch); let inst = iMem(MemReq{op:Ld, addr:pc, data:?}); ir <= TypeFetch2Decode{pc: pc, inst: inst}; stage <= Execute; endrule
March 5, 2012 L8-17http://csg.csail.mit.edu/
6.S078
Two-Cycle SMIPS rule doExecute(stage==Execute); let irpc = ir.pc; let inst = ir.inst; let dInst = decode(inst); Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, irpc); if(memType(eInst.iType)) eInst.data <- dMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data});
if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data); pc <= eInst.brTaken ? eInst.addr : pc + 4; stage <= Fetch; endrule endmodule
no change from single-cycle
March 5, 2012 L8-18http://csg.csail.mit.edu/
6.S078
Princeton versus Harvard Architecture
Harvard architecture uses different memories for instructions and data
needed for a single-cycle implementationPrinceton architecture uses the same memory for instruction and data and thus, requires at least two cycles to execute Load/Store instructions
The two-cycle implementations of Princeton and Harvard architectures are almost the same
March 5, 2012 L8-19http://csg.csail.mit.edu/
6.S078
SMIPS Princeton Architecture
PC
Memory
Decode
Register File
Execute+4 ir
Since both the Fetch and Execute stages want to use the memory, there is a structural hazard in accessing memory
March 5, 2012 L8-20http://csg.csail.mit.edu/
6.S078
stage
Two-Cycle SMIPS Princetonmodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkOnePortedMemory; let uMem = mem.port; Reg#(TypeFetch2Decode) ir <- mkRegU; Reg#(TypeStage) stage <- mkReg(Fetch);
rule doFetch (stage==Fetch); let inst <- uMem(MemReq{op:Ld, addr:pc, data:?}); ir <= TypeFetch2Decode{pc: pc, inst: inst}; stage <= Execute; endrule
March 5, 2012 L8-21http://csg.csail.mit.edu/
6.S078
Two-Cycle SMIPS Princeton rule doExecute(stage==Execute); let irpc = ir.pc; let inst = ir.inst; let dInst = decode(inst); Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, irpc); if(memType(eInst.iType)) eInst.data <- uMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data});
if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data); pc <= eInst.brTaken ? eInst.addr : pc + 4; stage <= Fetch; endrule endmoduleMarch 5, 2012 L8-22
http://csg.csail.mit.edu/6.S078
Two-Cycle SMIPS: Analysis
PC
InstMemory
Decode
Register File
Execute
DataMemory
+4 ir
stage
In any given clock cycle, lots of unused hardware!
ExecuteFetch
next lecture: Pipelining to increase the throughputMarch 5, 2012 L8-23
http://csg.csail.mit.edu/6.S078