Lec 15 Systems Architecture 1
Systems Architecture
Lecture 15: A Simple Implementation of MIPS
Jeremy R. Johnson
Anatole D. Ruslanov
William M. Mongan
Some or all figures from Computer Organization and Design: The Hardware/Software Approach, Third Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 2004 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).
Lec 15 Systems Architecture 2
Introduction
• Objective: To understand how to implement the MIPS
instruction set.
• Combine components (registers, memory, ALU) and add
control
• Fetch-Execute cycle
• Topics
– Sequential logic (elements with state) and timing (edge triggered)
• Memory
• Registers
– Datapath components: Instruction memory, PC, Adder, Register File,
ALU, Data Memory
– Implement a subset of MIPS in a single cycle computer
– Shortcomings of a single cycle computer
Lec 15 Systems Architecture 3
The Processor: Datapath & Control
• Implementation of MIPS
• Simplified to contain only:
– memory-reference instructions: lw, sw
– arithmetic-logical instructions: add, sub, and, or, slt
– control flow instructions: beq, j
• Generic Implementation:
– use the program counter (PC) to supply instruction address
– get the instruction from memory
– read registers
– use the instruction to decide exactly what to do
12/22/2011 Chapter 4 — The Processor 4
Instruction Execution
• PC instruction memory, fetch instruction
• Register numbers register file, read
registers
• Depending on instruction class
– Use ALU to calculate
• Arithmetic result
• Memory address for load/store
• Branch target address
– Access data memory for load/store
– PC target address or PC + 4
Lec 15 Systems Architecture 5
Abstract View
• Two types of functional units:
– elements that operate on data values (combinational)
– elements that contain state (sequential)
12/22/2011 Chapter 4 — The Processor 6
Multiplexers
Can’t just join
wires together
Use multiplexers
12/22/2011 Chapter 4 — The Processor 7
Control
Lec 15 Systems Architecture 8
Timing
• Clocks used in synchronous logic
– when should an element that contains state be updated?
• Edge-triggered timing
cycle time
rising edge
falling edge
Lec 15 Systems Architecture 9
Edge Triggered Timing
• State updated at clock edge
• Read contents of some state elements,
• Send values through some combinational logic
• Write results to one or more state elements
Clock cycle
State
element
1
Combinational logic
State
element
2
12/22/2011 Chapter 4 — The Processor 10
Logic Design Basics
§4.2
Log
ic D
esig
n C
on
ven
tion
s
• Information encoded in binary
– Low voltage = 0, High voltage = 1
– One wire per bit
– Multi-bit data encoded on multi-wire buses
• Combinational element
– Operate on data
– Output is a function of input
• State (sequential) elements
– Store information
22 December 2011 Chapter 4 — The Processor 11
Combinational Elements
• AND-gate
– Y = A & B
A
B Y
I0
I1 Y
M
u x
S
Multiplexer
Y = S ? I1 : I0
A
B
Y +
A
B
Y ALU
F
Adder
Y = A + B
Arithmetic/Logic Unit
Y = F(A, B)
12/22/2011 Chapter 4 — The Processor 12
Sequential Elements
• Register: stores data in a circuit
– Uses a clock signal to determine when to update the stored value
– Edge-triggered: update when Clk changes from 0 to 1
D
Clk
Q
Clk
D
Q
12/22/2011 Chapter 4 — The Processor 13
Sequential Elements
• Register with write control
– Only updates on clock edge when write control input is 1
– Used when stored value is required later
D
Clk
Q
Write
Write
D
Q
Clk
12/22/2011 Chapter 4 — The Processor 14
Clocking Methodology
• Combinational logic transforms data during clock cycles
– Between clock edges
– Input from state elements, output to state element
– Longest delay determines clock period
Lec 15 Systems Architecture 15
Components for Simple
Implementation
• Functional Units needed for each instruction
PC
Instruction
memory
Instruction address
Instruction
a. Instruction memory b. Program counter
Add Sum
c. Adder16 32
Sign
extend
b. Sign-extension unit
MemRead
MemWrite
Data
memoryWrite data
Read data
a. Data memory unit
Address
ALU control
RegWrite
RegistersWrite register
Read data 1
Read data 2
Read register 1
Read register 2
Write data
ALU result
ALU
Data
Data
Register
numbers
a. Registers b. ALU
Zero5
5
5 3
12/22/2011 Chapter 4 — The Processor 16
Instruction Fetch
32-bit
register
Increment by
4 for next
instruction
12/22/2011 Chapter 4 — The Processor 17
R-Format Instructions
• Read two register operands
• Perform arithmetic/logical operation
• Write register result
12/22/2011 Chapter 4 — The Processor 18
Load/Store Instructions
• Read register operands
• Calculate address using 16-bit offset
– Use ALU, but sign-extend offset
• Load: Read memory and update register
• Store: Write register value to memory
12/22/2011 Chapter 4 — The Processor 19
Branch Instructions
• Read register operands
• Compare operands
– Use ALU, subtract and check Zero output
• Calculate target address
– Sign-extend displacement
– Shift left 2 places (word displacement)
– Add to PC + 4
• Already calculated by instruction fetch
12/22/2011 Chapter 4 — The Processor 20
Branch Instructions
Just
re-routes
wires
Sign-bit wire
replicated
12/22/2011 Chapter 4 — The Processor 21
Composing the Elements
• First-cut data path does an instruction in one clock cycle
– Each datapath element can only do one function at a time
– Hence, we need separate instruction and data memories
• Use multiplexers where alternate data sources are used for
different instructions
12/22/2011 Chapter 4 — The Processor 22
R-Type/Load/Store Datapath
12/22/2011 Chapter 4 — The Processor 23
Full Datapath
Lec 15 Systems Architecture 25
Adding Control
• Selecting the operations to perform (ALU, read/write, etc.)
• Controlling the flow of data (multiplexor inputs)
• Information comes from the 32 bits of the instruction
op rs rt rd shamt funct
op rs rt 16 bit address
op 26 bit address
R
I
J
Lec 15 Systems Architecture 26
MIPS Instructions
• add $t0,$s1,$s2
• lw $t0,256($t1)
000000 10001 10010 01000 00000 100000
op rs rt rd shamt funct
100011 01001 01000 0000 0001 0000 0000
op rs rt offset
Lec 15 Systems Architecture 27
MIPS Instructions Continued
• beq $s1,$s2,25 => 100
• j 1024 => 4096 [+PC+4[31-28]]
000010 00 0000 0000 0000 0100 0000 0000
op address
000100 10001 10010 0000 0000 0001 1001
op rs rt offset
Lec 15 Systems Architecture 28
Determining ALU Control Bits
• ALUOp determined by instruction
Instruction ALUOp Instruction funct ALU ALU
opcode operation action control
LW 00 load word xxxxxx add 010
SW 00 store word xxxxxx add 010
BEQ 01 branch eq xxxxxx sub 110
R-type 10 add 100000 add 010
R-type 10 sub 100010 sub 110
R-type 10 and 100100 and 000
R-type 10 or 100101 or 001
R-type 10 slt 101010 slt 111
• Control Lines
000 and
001 or
010 add
110 sub
111 slt
Lec 15 Systems Architecture 29
• Must describe hardware to compute 3-bit ALU control
input
– given instruction type
00 = lw, sw
01 = beq,
10 = arithmetic
– function code for arithmetic
• Describe it using a truth table (can turn into gates):
ALUOp
computed from instruction type
ALU Control
Lec 15 Systems Architecture 30
Datapath with Control
PC
Instruction memory
Read address
Instruction [31–0]
Instruction [20–16]
Instruction [25–21]
Add
Instruction [5–0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31–26]
4
16 32Instruction [15–0]
0
0M u x
0
1
Control
AddALU
result
M u x
0
1
RegistersWrite register
Write data
Read data 1
Read data 2
Read register 1
Read register 2
Sign extend
Shift left 2
M u x
1
ALU result
Zero
Data memory
Write data
Read data
M u x
1
Instruction [15–11]
ALU control
ALU
Address
Lec 15 Systems Architecture 31
Control Line Settings
• 8 control lines (control read/write and multiplexors)
Instruction RegDst ALUSrc
Memto-
Reg
Reg
Write
Mem
Read
Mem
Write Branch ALUOp
R-format 1 0 0 1 0 0 0 Func Code
lw 0 1 1 1 1 0 0 add
sw X 1 X 0 0 1 0 add
beq X 0 X 0 0 0 1 sub
22 December 2011 Chapter 4 — The Processor 32
R-Type Instruction
22 December 2011 Chapter 4 — The Processor 33
Load Instruction
22 December 2011 Chapter 4 — The Processor 34
Branch-on-Equal Instruction
22 December 2011 Chapter 4 — The Processor 35
Implementing Jumps
• Jump uses word address
• Update PC with concatenation of
– Top 4 bits of old PC
– 26-bit jump address
– 00
• Need an extra control signal decoded from opcode
2 address
31:26 25:0
Jump
22 December 2011 Chapter 4 — The Processor 36
Datapath With Jumps Added
Lec 15 Systems Architecture 37
Shortcomings of a Single Cycle Implementation
• Limits reuse of hardware components
– each functional unit can be used only once per cycle
– e.g. instruction and data memory required
• Inefficient
– clock cycle determined by longest possible path in the machine
– E.G. Assume time for:
• Memory units = 200 ps
• ALU and adders = 100 ps
• Register file (read or write) = 50 ps
Instruction class
Instruction memory
Register read ALU operation
Data memory Register write Total
R-type 200 50 100 0 50 400 ps
Load word 200 50 100 200 50 600 ps
Store word 200 50 100 200 550 ps
Branch 200 50 100 0 350 ps
Jump 200 200 ps
Lec 15 Systems Architecture 38
Single Cycle Model is inefficient!
• Assume 25% loads, 10% stores, 45% ALU instructions, 15% branches, and 5% jumps
CPU execution time = Instruction count x CPI x Clock cycle time
Performance ratio =
CPU Performance (Multicycle impl.)
------------------------------------------------------ =
CPU Performance (Single cycle impl.)
CPU Exec. Time (Single cycle impl.)
------------------------------------------------------ =
CPU Exec. Time (Multicycle impl.)
600
------------------------------------------------------------------------------------- =
600 x 25% + 550 x 10% + 400 x 45% + 350 x 15% + 200 x 5%
600 ps
------------- = 1.34 faster
447.5 ps