Date post: | 05-Jan-2016 |
Category: |
Documents |
Upload: | simon-tate |
View: | 260 times |
Download: | 13 times |
Computer OrganizationCS224
Fall 2012
Lesson 22
The Big Picture
The Five Classic Components of a Computer
Chapter 4 Topic: Processor Design
Control
Datapath
Memory
Processor Input
Output
Introduction
CPU performance factors Instruction count
- Determined by ISA and compiler CPI and Cycle time
- Determined by CPU hardware
We will examine two MIPS implementations A simplified version A more realistic pipelined version
Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, ori, slt Control transfer: beq, j
§4.1 Introduction
The Performance Perspective
Performance of a machine is determined by: Instruction count Clock cycle time Clock cycles per instruction
Processor design (datapath and control) will determine: Clock cycle time--CCT Clock cycles per instruction--CPI
This week: Single cycle processor (datapath + control) Advantage: One clock cycle per instruction Disadvantage: long cycle time
CPI
Inst. Count Cycle Time
Processor Design Steps
1. Analyze instruction set => datapath requirements the meaning of each instruction is given by the register transfers
(ISA model => RTL model) datapath must include storage element for ISA registers
- possibly more datapath must support each register transfer
2. Select set of datapath components and establish clocking methodology
3. Assemble datapath meeting the RTL requirements
Processor Design (cont’d)
4. Analyze implementation of each instruction to determine setting of control points that effect the register transfer.
5. Assemble the control logic
6. RTL datapath and control design are refined to track physical design and functional validation
Changes made for timing and errata (a.k.a. “bug”) fixes Amount of work varies with capabilities of CAD tools and degree
of optimization for cost/performance
Subset of Instructions
To simplify our study of processor design, we will focus on a subset of the MIPS instructions
Memory: lw and sw Arithmetic: add, sub, and, ori, and slt Branch: beq and j
Example in lecture uses ori rather than or covered in text, to demonstrate one more category of instructions
The method of implementing other instructions should come naturally from these
MIPS Format Review
R-Format add rd, rs, rt sub rd, rs, rt
OP=0 rs rt rd sa funct
Bits 6 5 5 5 5 6
firstsource
register
secondsource
register
resultregister
shiftamount
functioncode
MIPS Format Review (cont)
I-Format lw rt, rs, imm sw rt, rs, imm beq rs, rt, imm ori rt, rs, imm
Reminders Branch uses PC Relative addressing (PC + 4 + 4 × imm)
OP rs rt imm
Bits 6 5 5 16
firstsource
register
secondsource
register
immediate
MIPS Format Review (cont)
J-Format j target
Reminders Uses pseudodirect addressing (target × 4) to allow addressing
228 bits directly Uses top 4 bits from PC
OP target
Bits 6 26
jump target address
Execution Cycle
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
Obtain instruction from program storage
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in storage for later use
Determine successor instruction
What Happens?
It’s hard to see how we should go about organizing the processor
To start thinking about it, look at what happens on each instruction
The instruction specified by the PC is fetched from memory One or two registers are read (lw vs. add for instance) The ALU must be used to add, subtract, etc. The results are stored (to memory or a register)
Instruction Execution
PC instruction memory, fetch instruction
Register numbers register file, read registers
Depending on instruction class Use ALU to calculate
- Arithmetic result
- Memory address for load/store
- Branch target address
Access data memory for load/store PC target address or PC + 4
Processor Overview
• Data flows through memory and functional units
Multiplexers
Can’t just join wires together Use multiplexers
Control
Logic Design Basics§4.2 Logic D
esign Conventions
Information encoded in binary Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses
Combinational element Operate on data Output is a function of input Example: ALU
State (sequential) elements Store information or state Example: Register File
1-bitFull
Adder
1 bit ALU
Using a MUX we can add the AND, OR, and adder operations into a single ALU
A
B
Cout
Cin ALUOp
Mu
x Result
4 bit ALU
A0
B01-bitALU
Result0
CIn0
COut0A1
B11-bitALU
Result1
CIn1
COut1A2
B2
1-bitALU Result2
CIn2
COut2A3
B31-bitALU
Result3
CIn3
COut3
COut3
ALUopALUop
4
4
A
B
ALUopALUop
3
Combinational Elements
32A
B32
Sum
Carry
Ad
der
Carry_In
32A
B32
Y32
Select
MU
X
32
32
A
B32
Result
Zero
OP
AL
U
Adder
ALU
MUX
32
D Latches
Modified SR Latch
Latches value when C is asserted
C
D
Q
Q
D Flip Flop
Uses Master/Slave D Latches
D
CLK
Q
Q
D
Latch
D
C
Q
Q
D
Latch
D
C
Q
Q
Storage Element: Register
Register Similar to D Flip Flop
- N bit input and output
- Write Enable input
Write Enable- 0: Data Out will not change
- 1: Data Out will become Data In
Data changes only on falling edge!
Clk
Data In
Write Enable
N N
Data Out
Storage Element: Reg File Register File consists of 32 registers
Two 32 bit output busses
- busA and busB
One 32 bit input bus
- busW
Register 0 hard wired to value 0 Register selected by
- RA selects register to put on busA
- RB selects register to put on busB
- RW selects register to be written via busW when Write Enable is 1
Clock input (CLK)
- CLK input is a factor only for write operation
- During read, behaves as combinational logic block– RA or RB stable busA or busB valid after “access time”– Minor simplification of reality
Clk
busW
Write Enable
32 32
busA
32
busB
5 5 5
RW RA RB
32 32-bitRegisters
Storage Element: Memory
Memory One input bus: Data In One output bus: Data Out Address selection
- Address selects the word to put on Data Out
- To write to address, set Write Enable to 1
Clock input (CLK)
- CLK input is a factor only for write operation
- During read, behaves as combinational logic block– Valid Address Data Out valid after “access time”– Minor simplification of reality
Clk
Data In
Write Enable
32 32
Data Out
Address
Some Logic Design…
All storage elements have same clock Edge-triggered clocking “Instantaneous” state change (simplification!) Timing always work if the clock is slow enough
Cycle Time = Clk-to-Q + Longest Delay + Setup + Clock Skew
Clk
Don’t CareSetup Hold
.
.
.
.
.
.
Setup Hold
.
.
.
.
.
.