Pipeline Datapath
Processor Design Report
CPRE 381
By: Daniel Keith and Luke McDonald
OverviewThe objective of this paper is to explain in detail the steps, difficulties, and successes the group
of us has had over the course of designing a pipelined datapath processor. This allowed us to sharpen our skills by practicing all of the concepts we have been learning about in class. Thought this paper we will explain the difficulties we encountered when designing the processor as well as explain our results and how they compare with how they are expected to outcome. Finally, we will discuss some of the lessons learned about working together in a team and several basic processor designs.
Basic Description of the projectWe chose to design a very simple pipelined computer CPU. The goal is that it should be able to
execute a “meaningful” program. One of the additional requirements is that it must be able to execute a for loop. We decided our data path width would be 32 bits because it was similar to how we had constructed our previous other labs. For both our single cycle and multi-cycle processors our data path implementation was 32 bit as well so we figured that we could get the pipeline to work in 32 bits as well. We also considered adding additional operations to the processor, but we figured that would take up more time than we actually had to work on it .The most important thing is that it provided an adequate amount of challenge in order to learn many things about a pipelined processor design.
Starting OutWe decided to break the processor design up into several different steps. By breaking the
project up into several different steps it allowed us to plan out steps and help us make good progress. Much like the example report, we decided to divide our project into the following steps:
- determine the architecture of the processor
- establish the instruction set
- determine the pipeline stages
- specify the behavior of the clock signal
- design, implement, and test the ALU
- draw each stage of the data path
- define the control lines
- determine the control lines
- determine the control line coding for each instruction
- implement and test the control module
- design and implement remaining modules, testing as necessary
- draw the final data path and name each wire
- implement the data path
- select the sequence of instructions to test the processor
- enter the machine code corresponding to these instructions into the instruction memory
- simulate the processor and verify execution
- look for oversights and modify the data path to correct
- Add data forwarding and hazard detection
- Determine mips instruction code
- Test and debug the mips code in the processor
Thanks to these steps, it was much easier for us to organize our thoughts and assign things to do for each person in the group.
Determining the architecture of the processorThe architecture of the processor is a very simple CPU design as we have already stated. It has
some very simple instruction formats, each with a fixed length. We chose a mips processor since it was what we were most familiar with from learning about it in class.
Data PathOur datapath is set up to be a modified mips processor. We decided to use a 32 bit data path
since several of our other previous data paths were modeled on a 32 bit structure. There is a chance we will not need to have it work with 32 bits, but we figured that if we encountered issues we could always drop the data path bit size back down to 16. Additionally, from we can tell, most of our code that we had written in our other labs, such as our memory register and our ALU were also modeled on 32 bit systems, so it made more sense to our group to implement the data path in this manner.
MemoryWe decided to implement our data memory and instruction memory separately, much like the
approach some of the lecture slides model. Our processor supports 32-bit memory addressing for both data memory as well as instruction memory. We implemented some code that we had written for lab 2. In this lab we used Quartus' mega-function wizard plugin manager to help create a custom mega-function that used a 1 port RAM function with 256 word storage, and had a 32 bit data length. It also has 8 bit address selection. We used a similar mega-function module in the pipeline processor design.
RegistersThe register file we used for the final project was a modified register file that we had built from
lab 4. In lab 4 we were asked to design a file that had 16 16 -bit registers, however for our final project implementation we updated our register file to consist of 32 32bit – registers. Each of these registers are intended to be general purpose register. You can see the model we followed for the register file on the left. The image is taken right from our lab 4 assignment.
Instruction FormatWe used the standard 32 bit instruction format for
R-type, load/store and branch. For each of these instruction types, the opcode is set to be the first 6 bits, as you can see in the figure below. The opcode allows the processor to determine the type of instruction which lets the processor know how the instruction should be handled. The 25:21 bits are used to determine the register store word and 20:16 is the instructions temporary register. The remaining bits (15:0) are typically used for the address of the instruction. You can see this listed in the diagram below. This diagram is directly from the
lecture slides. For the jump instructions the opcode should indicate 25 in order to allow the instruction to function properly.
Establishing the instruction setThe instruction set is rather important for our basic processor. We chose to use a 6 bit opcode
since we knew how each of the bits were assigned because we talked specifically about them in lecture and they were featured on the lecture slides. Additionally, we used some instructions that we had already used in lab 7 as you can see in the figure below.
Special Added instructions
BNE: Branches on not equal, useful for our for loop simulation.
Jump: Jumps to a specified register
R-Type instructions
For our instruction set we implemented several instructions in 32 bits for our ALU. You can find the instructions and the formats below
.
AND (opcode 000) bitwise and instruction
OR (opcode 001) performs the OR command on both the inputs
ADD (opcode: 010) Adds inputs in ALU A to ALU B.
SUB (opcode : 010 with bitinvert) Subtracts inputs in ALU A to ALU B, assuming bitinvert is set to 1.
SLT (opcode: 011) If A is less than B, reg is set to 1, else it is set to 0.
(TODO add instruction set types maybe)
Determining the pipeline stagesThe processor was divided up into the following sections, just like it was divided up in class.
We decided to base our design off of the figure below.
The instructions and operations are as follows:
Instruction fetch:
- reads instruction at a specific PC address from our memory unit.
- Additionally, it should increment the value of the PC
- increment branch or jump due to a previous command
Decode:
- Comparator takes place in this step, compares two datas out of the register file for initial data for branches
- All branches are taken care of in this state, rather than in the memory stage
- can be used to calculate branch address or decode the jump address
- assign the control line values for the desired instruction
- ensure that the value read from the file is properly determined so the processor can determine if a stall or forward is needed to function properly
Execution:
- determine ALU values
- determine if a branch condition is met
- stall the pipeline if the next instruction needs the result of the calculation
- forward the instructions if needed
- select a final register for the information to go to
Memory Access:
- read or write to the memory if it is necessary
Write Back:
- write to the register file if it is needed
- select one of the two results from the ALU or read from data
Determining the behavior of the clock signal
We decided that the clock would implement a simple on/off signal which will alternate states at the beginning of each cycle. Even though this is a very simple idea, it is vital to the survival of our project. If the clock is not “aligned” then it can throw off all of our data and break the process. Due to the fact that an instruction often has to flow through several modules in a single clock cycle the logic is entirely combinational in order for this to work. During the rising clock edge the registers record the values of the input and begin setting them to the output. With the falling clock edge, the input given, assuming the WriteReg bit is equal to 1. This means that the output value to the register is immediately
assigned to the same register that it is being read from. Additionally, the memory module records only if MemWrite is equal to 1. This memory module acts similar to the register.
Designing, implementing, and testing the ALU
For our pipeline design we implemented a similar design to the one that we used in lab 6 where we were asked to build a 32 bit ALU. Our design works by comparing each bit from register a to another single bit in register b. The operation done to that bit is determined by the opcode that we give to the ALU. As you can see by the figure below, there are several operations that are computed bit by bit.
The ALU was setup in this way by putting 32 one bit ALU’s together. Then the carries and the overflow would be transferred to the next ALU one bit unit. You can view a section of the module below.
Figure Above: A one bit block diagram of our ALU
Figure Above: Basic Idea for ALU data structure
Drawing each stage of the datapath
Both members of the group had to look and think about how the pipeline datapath functions in our processor design. We decided to model our path very closely to some slides that we had seen and familiarized ourselves with while attempting to configure a design. You can find a model of the conceptual datapath as well as a model of the state datapath below.
Figure Above: A diagram to help us conceptualize how data will flow through the basic processor
Implementing and testing the control module
In this section we will explain how we implemented as well as tested the control module. Previously, we used this same idea when implementing a control module for lab 8. We had to design a state machine for the multicycle processor to step through. We implemented each of the states of the machines by having one gigantic case statement for our control unit. The annoying thing about writing the control unit is that it makes it rather difficult to debug. We then simulated and debugged our code in
Figure Above: A model of the datapath with modules in it to help us break the design up
modelsim like we had been doing for most of the semester.
After many hours of testing we were finally able to get our state machine for the pipeline setup. At this point we began adding in and repairing other modules that we needed for the processor. At this point we also updated modelsim’s library by adding the “altera_mf_ver” library in order to get our simulation to properly compile. This library needed to be added to make the data modules function properly. After the simulations were properly added we began adding and implementing some mips instructions like we had done in previous labs.
Adding Forwarding and hazard detection
Another thing we had to add to our processor design was data forwarding as well as hazard detection. We really struggled with implementing this section in our processor. This issue was mostly caused by the fact that neither group member really understood forwarding or hazard detection well. It took us many hours of work to debug and determine how the forwarding unit should be functioning. We did get some practice using forwarding and hazard detection during lab 9. During this lab, we had to design both a forwarding unit as well as a hazard detection unit. We actually wound up using very similar code from lab 9 in our pipelined processor in order to get our data to forward properly. You can find the Verilog code that we used in the pipeline below.
Figure above: The Verilog code for our forwarding unitFigure Above: The Verilog code from our hazard unit
MIPS Instruction Code
For the demo of our design we have added some memory initialization files as well as some mips that uses a simple for loop instruction. We also had to add another .mif file to help set up how the data memory steps through the code. You can find a screen shot of how the data is initialized during the first few cycles below. We wrote out a mips instruction set that uses several functions within the mips library such as LW,BNE,ADD, and SW to loop through the fibbonachi sequence. In order to get this mips to run through our processor we had to convert the code into hexadecimal in our .mif file. In this file, the last few bits of the hex point to some registers where the initial few bits are used for the formatting in the instruction. You can find a screenshot of the mips code as well as a screenshot of the mips code instructions below.
1. lw R1, 0(R0);2. lw R2, 1(R0);3. lw R3, 2(R0);4. lw R4, 3(R0);
5. loop:6. ADD R7, R7, R3;7. ADD R5, R1, R2;8. ADD R1, R2, R0;9. ADD R2, R5, R0;10. BNE R7,R4, loop;
11. SW R5, 6(R0);
Our Mips instruction set
Testing and Debugging the Mips code in the processor
Figure Above: Initial Data memory initialization file (.mif)
After properly configuring each of the .mif files our group was ready to begin testing and debugging the pipeline. We initially checked if the data was being properly loaded into the processor. We started off testing our very simple add instruction. We figured we could check on our actual mips for loop after we debugged and figured out which of the units were properly working. As you can see in the diagram below, the storage from the previously stated mips code is working properly.
Figure Above: Storage is correct
As we continue to step through the code we had to check to see if our branch not equal instruction was functioning properly. This instruction is very important to work for our given implementation. You can find a screenshot of the branch not equal working below.
As we have previously mentioned, getting forwarding to work in our processor took up most of our time. After debugging and comprehending how forwarding was supposed to function we finally were able to sort out our issues with our forwarding unit and get it to work. You can find a screenshot of the forwarding working properly on address 8 below.
Figure Above: Branch not equal functions properly
Finally, after the processor has looped through each of the instructions set up by our mips code, the final step produces our expected result. You can see a screenshot of the last cycle as it completes the Fibonacci sequence.
Conclusion
There were many issues and struggles that our group had to overcome during the course of this project. Both group members were rather busy especially during dead week with other things so it was
Figure Above: An example of our functioning forwarding
Figure Above: Fibonacci sequence loop in its last state
difficult for us to both schedule time for us to work together. Also, it was a little frustrating that the lab was often full when we were trying to organize a time to meet. This meant that we occasionally had to reschedule or work on the project on our own time. However, this did give us some valuable practice in working together as a team. Eventually, we were both able to pull together and crank out a successful project that works incredibly well.
Appendix/Code Library:
Main code:
pc m1(in_pc[7:0],pc_address[7:0],INHINC,clk);// pc module
instruction_memory m2(pc_address[7:0],clk1, 32'h00000000,zero,im_out[31:0]);
assign add_out[7:0] = pc_address[7:0] +1;reg_32 m35(add_out[7:0],add_out1[7:0], clk);//holds address counter
reg_32_IDIF m3(im_out[31:0],im_out_R[31:0],IFIDWE, clk);//holds instructions
reg_32 m21(im_out_R[20:11],wa_in[9:0], clk);//holds write address optionsreg_32 m22(im_out_R[25:21], rs[4:0],clk);//hold data for fowarding
Mult_5 m23(wa_in[9:5],wa_in[4:0],DegDst1,wa_out[4:0]);//selector for write adress for the file register
reg_32 m24(wa_out[4:0],wa_out1[4:0], clk);//holds write addressreg_32 m25(wa_out1[4:0],wa_out2[4:0], clk);//holds write address
hazard m26(MemRead1, wa_in[9:5], im_out_R[25:21], im_out_R[20:16], IFIDWE, INHINC, NOP);
v_file_reg_32by32 m27(RegWrite3,wa_out2[4:0], Data_m_out1[31:0], im_out_R[25:21],Data1[31:0],im_out_R[20:16],Data2[31:0]);// file register
compare m47(Data1[31:0],Data2[31:0], branch_zero);// test branch condition
reg_32 m28(Data1[31:0],Data1_R[31:0], clk);//holds data1reg_32 m29(Data2[31:0],Data2_R[31:0], clk);//holds data2reg_32 m30(Data2_R[31:0],Data2_R1[31:0], clk);//holds data2 cycle second cycle for write data on the data memory
sign_extend m31(im_out_R[15:0],sg_out[31:0]); // bit extender from 16 to 32 bit
reg_32 m32(sg_out[31:0],sg_out1[31:0], clk);//hols sign_extender bits and ALU control bitscontrol m4(im_out_R[31:26],DegDst, ALUSrc, MemtoReg, RegWrite,MemRead, MemWrite,Branch_BEQ, ALUOp1,ALUOp0,Branch_BNE,jmp);
assign top_alu_out[7:0] = add_out1[7:0] + sg_out[7:0];
and(BEQ,branch_zero,Branch_BEQ);//if statement in enablenot(branch_zero1,branch_zero);and(BNE,branch_zero1,Branch_BNE);or(Bran,BEQ,BNE);
Mult_8by_4 m43(add_out[7:0],top_alu_out[7:0],im_out_R[7:0],z,Bran,jmp,in_pc[7:0]);//pc address selector
Mult m50(RegWrite,0,Bran);
Mult_32 m5({DegDst, ALUSrc, MemtoReg, RegWrite,MemRead, MemWrite,ALUOp1,ALUOp0},0,NOP,controls[7:0]);
reg_1 m6(controls[6],ALUSrc1, clk);//holds controls instructions ex stage holds for 1 cyclesreg_1 m7(controls[7],DegDst1, clk);//holds controls instructions ex stage holds for 1 cyclesreg_1 m8(controls[1],ALUOp1R, clk);//holds controls instructions ex stage holds for 1 cyclesreg_1 m9(controls[0],ALUOp0R, clk);//holds controls instructions ex stage holds for 1 cycles
reg_1 m10(controls[3],MemRead1, clk);//holds controls instructions Mem stage holds for 1 cycles for data hazardreg_1 m11(controls[2],MemWrite1, clk);//holds controls instructions Mem stage holds for 2 cyclesreg_1 m12(MemWrite1,MemWrite2, clk);//holds controls instructions Mem stage holds for 2 cycles
reg_1 m15(controls[5],MemtoReg1, clk);//holds controls instructions WB stage holds for 3 cyclesreg_1 m16(MemtoReg1,MemtoReg2, clk);//holds controls instructions WB stage holds for 3 cyclesreg_1 m17(MemtoReg2,MemtoReg3, clk);//holds controls instructions WB stage holds for 3 cyclesreg_1 m18(controls[4],RegWrite1, clk);//holds controls instructions WB stage holds for 3 cyclesreg_1 m19(RegWrite1,RegWrite2, clk);//holds controls instructions WB stage holds for 3 cyclesreg_1 m20(RegWrite2,RegWrite3, clk);//holds controls instructions WB stage holds for 3 cycles
//reg_32 m42(alu_out1[31:0],alu_out2[31:0], clk);//holds ALU output for WB stage
data_memory m44(alu_out1[7:0],clk1,Data2_R1[31:0],MemWrite2, Data_m_out[31:0]);//data memory
Mult_32 m46(alu_out1[31:0],Data_m_out[31:0],MemtoReg2, in_reg[31:0]);//data selector
reg_32 m45(in_reg[31:0],Data_m_out1[31:0], clk);//holds data memory output
forward m38(rs[4:0], wa_in[9:5], wa_out1[4:0], wa_out2[4:0], RegWrite2, RegWrite3, Data1_R[31:0], Data2_R[31:0], alu_out[31:0], in_reg[31:0] , ALUIND1[31:0], ALUIND2[31:0]);
Mult_32 m34(ALUIND2[31:0],sg_out1[31:0],ALUSrc1, in_b[31:0]);//ALU selector for B input
ALU_control m33(ALUOp0R,ALUOp1R,sg_out1[5],sg_out1[4],sg_out1[3],sg_out1[2],sg_out1[1],sg_out1[0], ALU_op[2:0]); // The ALU control unit
alu32 m39(alu_out[31:0], ALU_op[2:0], ALUIND1[31:0], in_b[31:0],z, ALU_op[2]);//primaty alu
reg_32 m40(alu_out[31:0],alu_out1[31:0], clk);//holds ALU output
Control code:
module control(instr, DegDst, ALUSrc, MemtoReg, RegWrite, MemRead, MemWrite,Branch_BEQ, ALUOp1,ALUOp0,Branch_BNE,jmp);
input [5:0]instr;output DegDst, ALUSrc, MemtoReg, RegWrite, MemRead, MemWrite,Branch_BEQ, ALUOp1,ALUOp0,Branch_BNE,jmp;
reg DegDst, ALUSrc, MemtoReg, RegWrite, MemRead, MemWrite,Branch_BEQ, ALUOp1,ALUOp0,Branch_BNE,jmp;
initial begin
DegDst = 0; ALUSrc = 1; MemtoReg = 1; RegWrite = 1; MemRead = 1; MemWrite = 0; Branch_BEQ = 0; ALUOp1 = 0;ALUOp0 = 0; Branch_BNE =0; jmp=0;
end
always @(*) begin DegDst <= (~instr[5]);
ALUSrc <= (instr[5]); MemtoReg <= (instr[5]); RegWrite <= (~instr[4]&~instr[3]&~instr[2]); MemRead <= (instr[5]&~instr[4]&~instr[3]); MemWrite <= (instr[5]&~instr[4]&instr[3]); Branch_BEQ <= (~instr[5]&~instr[4]&~instr[3]&instr[2]&~instr[0]); ALUOp1 <= (~instr[5]&~instr[4]&~instr[3]&~instr[2]); ALUOp0 <= (~instr[5]&~instr[4]&~instr[3]&instr[2]); Branch_BNE <=(~instr[5]&~instr[4]&~instr[3]&instr[2]&instr[0]); jmp <= (~instr[5]&~instr[4]&~instr[3]&~instr[2]&instr[1]&~instr[0]); end
endmodule
File Register:module v_file_reg_32by32 (input write, input [4:0] WA, input [31:0] LD_Data, input [4:0] RA1, output reg [31:0] Data1, input [4:0] RA2, output reg [31:0] Data2);
reg [31:0] reg0, reg1, reg2, reg3, reg4, reg5, reg6, reg7, reg8, reg9, reg10, reg11, reg12, reg13, reg14, reg15,reg16,reg17,reg18, reg19 ,reg20 ,reg21 ,reg22 ,reg23 ,reg24 ,reg25 ,reg26 , reg27 ,reg28 ,reg29 ,reg30 ,reg31;initialbegin reg0 =0; reg1 = 0; reg2 = 0; reg3 = 0; reg4 = 0; reg5 = 0; reg6 = 0; reg7 = 0; reg8 = 0; reg9 = 0; reg10 = 0; reg11 = 0; reg12 = 0; reg13 = 0;reg14 = 0; reg15 = 0; reg16 = 0; reg17 = 0; reg18 = 0; reg19 = 0; reg20 = 0; reg21 = 0; reg22 = 0; reg23 = 0; reg24 = 0; reg25 = 0; reg26 = 0; reg27 = 0; reg28 = 0; reg29 = 0; reg30 = 0; reg31 = 0;end
always @(*) begin case (RA1)
0: Data1 = reg0;1: Data1 = reg1;2: Data1 = reg2;3: Data1 = reg3;4: Data1 = reg4;5: Data1 = reg5;6: Data1 = reg6;7: Data1 = reg7;8: Data1 = reg8;9: Data1 = reg9;10: Data1 = reg10;11: Data1 = reg11;12: Data1 = reg12;13: Data1 = reg13;14: Data1 = reg14;15: Data1 = reg15;16: Data1 = reg16;17: Data1 = reg17;18: Data1 = reg18;19: Data1 = reg19;20: Data1 = reg20;21: Data1 = reg21;22: Data1 = reg22;23: Data1 = reg23;24: Data1 = reg24;
25: Data1 = reg25; 26: Data1 = reg26;
27: Data1 = reg27;28: Data1 = reg28;29: Data1 = reg29;30: Data1 = reg30;31: Data1 = reg31;default: Data1 = 32'h00000000;
endcase end always @(*) begin case (RA2)
0: Data2 = reg0;1: Data2 = reg1;2: Data2 = reg2;3: Data2 = reg3;4: Data2 = reg4;5: Data2 = reg5;6: Data2 = reg6;7: Data2 = reg7;8: Data2 = reg8;9: Data2 = reg9;10: Data2 = reg10;11: Data2 = reg11;12: Data2 = reg12;13: Data2 = reg13;14: Data2 = reg14;15: Data2 = reg15;16: Data2 = reg16;17: Data2 = reg17;18: Data2 = reg18;19: Data2 = reg19;20: Data2 = reg20;21: Data2 = reg21;22: Data2 = reg22;23: Data2 = reg23;24: Data2 = reg24;25: Data2 = reg25;
26: Data2 = reg26;27: Data2 = reg27;28: Data2 = reg28;29: Data2 = reg29;30: Data2 = reg30;31: Data2 = reg31;default: Data2 = 32'h00000000;
endcase end always @(*) begin if (write)
case (WA) 0: reg0 <= LD_Data; 1: reg1 <= LD_Data; 2: reg2 <= LD_Data;
3: reg3 <= LD_Data; 4: reg4 <= LD_Data; 5: reg5 <= LD_Data; 6: reg6 <= LD_Data; 7: reg7 <= LD_Data; 8: reg8 <= LD_Data; 9: reg9 <= LD_Data; 10: reg10 <= LD_Data; 11: reg11 <= LD_Data; 12: reg12 <= LD_Data; 13: reg13 <= LD_Data; 14: reg14 <= LD_Data; 15: reg15 <= LD_Data; 16: reg16 <= LD_Data; 17: reg17 <= LD_Data; 18: reg18 <= LD_Data; 19: reg19 <= LD_Data; 20: reg20 <= LD_Data; 21: reg21 <= LD_Data; 22: reg22 <= LD_Data; 23: reg23 <= LD_Data; 24: reg24 <= LD_Data; 25: reg25 <= LD_Data; 26: reg26 <= LD_Data; 27: reg27 <= LD_Data; 28: reg28 <= LD_Data; 29: reg29 <= LD_Data; 30: reg30 <= LD_Data; 31: reg31 <= LD_Data; endcase // case (wrAddr)
end // always @ (posedge clk)endmodule
Instruction fetch and data memory:The code for both of these modules are created by the Quartus 2 mega function wizard so there is no need to list the code due to the fact there both standard 1-port Ram module from the mega function wizard.
Registers: Used standard register code to specialize registers for various tasks of which some have enable write bits and some don’t. There all based off of this code:
module reg_1(I,Q,clk) input I; input clk; output Q; reg Q; initial
begin Q = 0; end
always @(posedge clk) begin if(1) Q<= I; endendmodule
Multiplexers:
Used standard multiplexer code to make various bit width units and various number of inputs based off of this:
module Mult(x,y,s,out);input x,y;input s;output out;reg in;
always @(*) begin if(s) in = y; else in =x;end
assign out = in;endmodule
ALU:module alu32(Result, ALUOp,A,B,Zero,binvert);output [31:0] Result; reg [31:0] Result; input binvert; output Zero; reg Zero; input [2:0] ALUOp; input [31:0] A,B; reg [31:0]b; always @(*) begin if(binvert) b = ~B +1; else b= B; end initial begin Result =0; Zero =0; end
always @(A or B or ALUOp) begin case(ALUOp) 3'b000: Result = A & b; 3'b001: Result = A | b; 3'b010: Result = A + b; 3'b110: Result = A + b; //still need a bit invert 3'b111: Result = A <= b; endcase end always @(*) begin if((A == B)&binvert) Zero = 1; end endmodule
ALU Control:module ALU_control(ALUOp0,ALUOp1,F5,F4,F3,F2,F1,F0,Operation);
input ALUOp0,ALUOp1,F5,F4,F3,F2,F1,F0;output [2:0]Operation;reg [2:0]op;
always @(*) beginif((~ALUOp1&~ALUOp0)|(ALUOp1&~F3&~F2&~F1&~F0)) op = 2;
else if( (ALUOp0)|(ALUOp1&~F3&~F2&F1&~F0) ) op = 6;
else if(ALUOp1&~F3&F2&~F1&~F0) op =0;
else if(ALUOp1&~F3&F2&~F1&F0) op = 1;
else if(ALUOp1&F3&~F2&F1&~F0) op = 7;endassign Operation = op;
endmodule
Sign extend:module sign_extend(in, out); input [15:0]in; output [31:0]out; assign out[15:0] = in[15:0]; assign out[16] = in[15]; assign out[17] = in[15]; assign out[18] = in[15]; assign out[19] = in[15];
assign out[20] = in[15]; assign out[21] = in[15]; assign out[22] = in[15]; assign out[23] = in[15]; assign out[24] = in[15]; assign out[25] = in[15]; assign out[26] = in[15]; assign out[27] = in[15]; assign out[28] = in[15]; assign out[29] = in[15]; assign out[30] = in[15]; assign out[31] = in[15];endmoduleCompare:Used for branches to short the delay of the address change.
module compare(in1,in2,out); input [31:0] in1 , in2; output out; reg out; always @(*) begin if(in1 == in2) out = 1; else out =0; endendmodule