+ All Categories
Home > Documents > CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ......

CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ......

Date post: 18-Jan-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
47
CS 110 Computer Architecture Lecture 10: Datapath Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University 1 Slides based on UC Berkley's CS61C
Transcript
Page 1: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

CS110ComputerArchitecture

Lecture10:Datapath

Instructor:SörenSchwertfeger

http://shtech.org/courses/ca/

School of Information Science and Technology SIST

ShanghaiTech University

1Slides based on UC Berkley's CS61C

Page 2: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Review

• TimingconstraintsforFiniteStateMachines– Setuptime,HoldTime,ClocktoQtime

• Usemuxes toselectamonginputs– Scontrolbitsselectsfrom2S inputs– Eachinputcanben-bitswide,independentofS– Canimplementmuxes hierarchically

• ALUcanbeimplementedusingamux– Coupledwithbasicblockelements– Adder/Substractor &AND&OR&shift

2

Page 3: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Processor

Control

Datapath

ComponentsofaComputer

3

PC

Registers

Arithmetic&LogicUnit(ALU)

MemoryInput

Output

Bytes

Enable?Read/Write

Address

WriteData

ReadData

Processor-MemoryInterface I/O-MemoryInterfaces

Program

Data

Page 4: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

TheCPU• Processor(CPU):theactivepartofthecomputerthatdoesallthework(datamanipulationanddecision-making)

• Datapath:portionoftheprocessorthatcontainshardwarenecessarytoperformoperationsrequiredbytheprocessor

• Control:portionoftheprocessor(alsoinhardware)thattellsthedatapath whatneedstobedone

4

Page 5: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

One-Instruction-Per-CycleRISC-VMachine• Oneclocktick=>

oneinstruction

• Currentstateoutputs=>inputstocombinationallogic=>outputssettleatthevaluesofstatebeforenextclockedge

• Risingclockedge:– allstateelements

areupdatedwithcombinationallogicoutputs

– executionmovestonextclockcycle

Registers

PC

Instr.Mem

DataMem

CombinationalLogic

clock

5

Page 6: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

DatapathandControl• Datapath designedtosupportdatatransfersrequiredbyinstructions

• Controllercausescorrecttransferstohappen

Controlleropcode, funct

inst

ruct

ion

mem

ory

+4

rtrsrd

regi

ster

sALU

Dat

am

emor

y

imm

PC

6

Page 7: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

StagesoftheDatapath :Overview

• Problem:asingle,“monolithic”blockthat“executesaninstruction”(performsallnecessaryoperationsbeginningwithfetchingtheinstruction)wouldbetoobulkyandinefficient

• Solution:breakuptheprocessof“executinganinstruction”intostages,andthenconnectthestagestocreatethewholedatapath– smallerstagesareeasiertodesign– easytooptimize(change)onestagewithouttouchingtheothers(modularity)

7

Page 8: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

FiveStagesofInstructionExecution• Stage1:InstructionFetch(IF)

• Stage2:InstructionDecode(ID)

• Stage3:Execute(EX):ALU(Arithmetic-LogicUnit)

• Stage4:MemoryAccess(MEM)

• Stage5:RegisterWrite(WB)

8

Page 9: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

StagesofExecutiononDatapath

inst

ruct

ion

mem

ory

+4

rtrsrd

regi

ster

s

ALU

Dat

am

emor

y

imm

1.InstructionFetch

2.Decode/RegisterRead

3.Execute 4.Memory 5.RegisterWrite

PC

9

Page 10: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

StagesofExecution(1/5)

• ThereisawidevarietyofRISC-Vinstructions:sowhatgeneralstepsdotheyhaveincommon?

• Stage1:InstructionFetch– nomatterwhattheinstruction,the32-bitinstructionwordmustfirstbefetchedfrommemory(thecache-memoryhierarchy)

– also,thisiswhereweIncrementPC(thatis,PC=PC+4,topointtothenextinstruction:byteaddressingso+4)

10

Page 11: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

StagesofExecution(2/5)• Stage2:InstructionDecode

– uponfetchingtheinstruction,wenextgatherdatafromthefields(decodeallnecessaryinstructiondata)

– first,readtheopcode todetermineinstructiontypeandfieldlengths

– second,readindatafromallnecessaryregisters• foradd,readtworegisters• foraddi,readoneregister

– third,generatetheimmediates11

Page 12: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

StagesofExecution(3/5)• Stage3:ALU(Arithmetic-LogicUnit)

– therealworkofmostinstructionsisdonehere:arithmetic(+,-,*,/),shifting,logic(&,|)

– whataboutloadsandstores?• lw t0,40(t1)• theaddressweareaccessinginmemory=thevalueint1 PLUSthevalue40

• sowedothisadditioninthisstage– alsodoesstuffforotherinstructions…

12

Page 13: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

StagesofExecution(4/5)

• Stage4:MemoryAccess– actuallyonlytheloadandstoreinstructionsdoanythingduringthisstage;theothersremainidleduringthisstageorskipitalltogether

– sincetheseinstructionshaveauniquestep,weneedthisextrastagetoaccountforthem

– asaresultofthecachesystem,thisstageisexpectedtobefast

13

Page 14: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

StagesofExecution(5/5)

• Stage5:RegisterWrite– mostinstructionswritetheresultofsomecomputationintoaregister

– examples:arithmetic,logical,shifts,loads,jumps– whataboutstores,branches?

• don’twriteanythingintoaregisterattheend• theseremainidleduringthisfifthstageorskipitalltogether

14

Page 15: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

StagesofExecutiononDatapath

inst

ruct

ion

mem

ory

+4

rtrsrd

regi

ster

s

ALU

Dat

am

emor

y

imm

1.InstructionFetch

2.Decode/RegisterRead

3.Execute 4.Memory 5.RegisterWrite

PC

15

Page 16: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

• CombinationalElements

• StorageElements+ClockingMethodology• BuildingBlocks

Datapath Components:Combinational

32A

B32

Y32

Select

MUX

Multiplexer

32

32

A

B32

Result

OP

ALU

ALU

32

32

A

B32 Sum

CarryOut

CarryIn

Adder

Adder

16

Page 17: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Datapath Elements:StateandSequencing(1/3)

• Register

• WriteEnable:– Negated(ordeasserted)(0):DataOutwillnotchange

– Asserted(1):DataOutwillbecomeDataInonpositiveedgeofclock

clk

DataIn

WriteEnable

N N

DataOut

17

Page 18: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

• Registerfile(regfile,RF)consistsof32registers– Two32-bitoutputbusses:busA andbusB– One32-bitinputbus:busW– Inoneclockcyclecanreadtworegisters

andwriteanother!

• Registerisselectedby:– RA(number)selectstheregistertoputonbusA (data)– RB(number)selectstheregistertoputonbusB (data)– RW(number)selectstheregistertobewritten

viabusW (data)whenWriteEnableis1

• Clockinput(clk)– Clk inputisafactorONLYduringwriteoperation– Duringreadoperation,behavesasacombinationallogicblock:

• RAorRBvalidÞ busA orbusB validafter“accesstime.”

Clk

busW

WriteEnable

3232

busA

32busB

5 5 5RW RA RB

32x32-bitRegisters

Datapath Elements:StateandSequencing(2/3)

18

Page 19: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

• “Magic”Memory– Oneinputbus:DataIn– Oneoutputbus:DataOut

• Memorywordisfoundby:– ForRead:AddressselectsthewordtoputonDataOut– ForWrite:SetWriteEnable=1:addressselectsthememorywordtobewrittenviatheDataInbus

• Clockinput(CLK)– CLKinputisafactorONLYduringwriteoperation– Duringreadoperation,behavesasacombinationallogicblock:AddressvalidÞ DataOutvalidafter“accesstime”

Clk

DataIn

WriteEnable

32 32DataOut

Address

Datapath Elements:StateandSequencing(1/3)

19

Page 20: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

StateRequiredbyRV32IISAEachinstructionreadsandupdatesthisstateduringexecution:• Registers(x0..x31)

– Registerfile(regfile)Reg holds32registersx32bits/register:Reg[0]..Reg[31]– Firstregisterreadspecifiedbyrs1fieldininstruction– Secondregisterreadspecifiedbyrs2fieldininstruction– Writeregister(destination)specifiedbyrd fieldininstruction– x0 isalways0(writestoReg[0]areignored)

• ProgramCounter(PC)– Holdsaddressofcurrentinstruction

• Memory(MEM)– Holdsbothinstructions&data,inone32-bitbyte-addressedmemoryspace– We’lluseseparatememoriesforinstructions(IMEM)anddata(DMEM)

• Theseareplaceholdersforinstructionanddatacaches– Instructionsareread(fetched)frominstructionmemory(assumeIMEM read-only)– Load/storeinstructionsaccessdatamemory

20

Page 21: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Review:CompleteRV32IISA

• Needdatapath andcontroltoimplementtheseinstructions

NotinCA

21

Page 22: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Implementingtheadd instruction

add rd, rs1, rs2

• Instructionmakestwochangestomachine’sstate:– Reg[rd] = Reg[rs1] + Reg[rs2]– PC = PC + 4

0000000 rs2 rs1 000 rd 0110011

Reg-Reg OPrdaddadd rs2 rs1

7 5 5 3 75

31 25 20 15 71224 19 14 11 6 0funct7 rs2 rs1 funct3 rd opcode

22

Page 23: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Datapath foradd

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[24:20] ALU+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

aluReg[rs1]

Reg[rs2]

Inst[31:0]

Control logic

RegWriteEnable (RegWEn)=1

add 5 5 add Reg-Reg OP5

31 25 20 15 71224 19 14 11 6 00000000 rs2 rs1 000 rd opcode

Reg[rd] = Reg[rs1] + Reg[rs2]PC = PC + 4

23

Page 24: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

TimingDiagramforadd

1000 1004PC

1004 1008PC+4

add x1,x2,x3 add x6,x7,x9inst[31:0]

Clock

time

Reg[2] Reg[7]Reg[rs1]

Reg[2]+Reg[3]alu Reg[7]+Reg[9]

Reg[3] Reg[9]Reg[rs2]

???Reg[1] Reg[2]+Reg[3]

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[24:20] ALU+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

aluReg[rs1]

Reg[rs2]

Inst[31:0]clock RegWEn

24

Page 25: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Implementingthesub instruction

sub rd, rs1, rs2

• Almostthesameasadd,exceptnowhavetosubtractoperandsinsteadofaddingthem

• inst[30] selectsbetweenaddandsubtract

31 25 20 15 71224 19 14 11 6 00000000 rs2 rs1 000 rd 01100110100000 rs2 rs1 000 rd 0110011

addsub

25

Page 26: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Datapath foradd/sub

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[24:20] ALU+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

aluReg[rs1]

Reg[rs2]

Control logic

RegWEn(1=Write, 0=NoWrite)

ALUSel(add=0/sub=1)

Inst[31:0]

26

Page 27: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

ImplementingotherR-Formatinstructions

• Allimplementedbydecodingfunct3andfunct7fieldsandselectingappropriateALUfunction

0000000 rs2 rs1 000 rd 0110011

0100000 rs2 rs1 000 rd 0110011

0000000 rs2 rs1 001 rd 0110011

addsubsll

0000000 rs2 rs1 010 rd 0110011 slt0000000 rs2 rs1 011 rd 0110011

0000000 rs2 rs1 100 rd 0110011 xor0000000 rs2 rs1 101 rd 0110011 srl0100000 rs2 rs1 101 rd 0110011 sra0000000 rs2 rs1 110 rd 01100110000000 rs2 rs1 111 rd 0110011

orand

sltu

27

Page 28: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

ImplementingI-Format- addiinstruction

• RISC-VAssemblyInstruction:addi x15,x1,-50

111111001110 00001 000 01111 0010011

OP-Immrd=15addimm=-50 rs1=1

5 3 75

31 20 15 71219 14 11 6 0rs1 funct3 rd opcodeimm[11:0]

12

28

Page 29: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Datapath foradd/sub

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[24:20] ALU+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

aluReg[rs1]

Reg[rs2]

Inst[31:0]

Control logic

RegWEn(1=Write, 0=NoWrite)

ALUSel(add=0/sub=1)

Immediate shouldbe here

29

Page 30: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Addingaddi toDatapath

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[24:20] ALU+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

aluReg[rs1]

Reg[rs2]

Inst[31:0]

Control logic

RegWEn(1=Write, 0=NoWrite)

ALUSel(add=0/sub=1)

BSel(rs2=0/Imm=1)

0

1

Imm[31:0]

30

Page 31: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Addingaddi toDatapath

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[24:20] ALU+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

aluReg[rs1]

Reg[rs2]

Inst[31:0]

Control logic

RegWEn=1 ALUSel=add

BSel(rs2=0/Imm=1)

Bsel = 1

ImmSel=I

0

1

Imm[31:0]Imm.Gen

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[31:20]

31

Page 32: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

I-Formatimmediates

inst[31:0]

------inst[31]-(sign-extension)------- inst[30:20]

imm[31:0]

Imm.Gen

inst[31:20] imm[31:0]

ImmSel=I

• High 12 bits of instruction (inst[31:20]) copied to low 12 bits of immediate (imm[11:0])

• Immediate is sign-extended by copying value of inst[31] to fill the upper 20 bits of the immediate value (imm[31:12])

-inst[31]-31 30 20 15 71219 14 11 6 0

rs1 funct3 rd opcodeimm[11:0]12

32

Page 33: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

R+I Datapath

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[24:20] ALU+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

aluReg[rs1]

Reg[rs2]

Inst<31:0>

Control logic

RegWEn ALUSelBSel

0

1

Imm[31:0]Imm.Gen

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[31:20]

Works for all other I-format arithmetic instructions (slti,sltiu,andi,ori,xori,slli,srli, srai) just by changing ALUSel

ImmSel

33

Page 34: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

PeerInstruction

1)Programcounterisaregister2)We shouldusethemainALUtocomputePC=PC+4inordertosavesomegates

3)TheALUisasynchronousstateelement

123A: FFFB: FFTC: FTFD: FTTE: TFFF: TFTG: TTFH: TTT

34

Page 35: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Addlw• RISC-VAssemblyInstruction(I-type): lw x14, 8(x2)

5 3 75

31 20 15 71219 14 11 6 0rs1 funct3 rd opcodeimm[11:0]

12offset[11:0] base width dest LOAD

31 20 15 71219 14 11 6 000010 010 01110 0000011000000001000

imm= +8 rs1=2 LW rd=14 LOAD

• The 12-bit signed immediate is added to the base address in register rs1 to form the memory address

• This is very similar to the add-immediate operation but used to create address not to create final result

• The value loaded from memory is stored in register rd35

Page 36: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Addinglw toDatapath

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[24:20] ALU+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

aluReg[rs1]

Reg[rs2]

Inst[31:0]

Control logic

RegWEn=1

ALUSel=Add

Bsel=1

WBSel=0

MemRW=Read

0

1

Imm[31:0]Imm.Gen

+4Add

clk

addrinst

IMEM DMEM

addrDataR

PCpc+4

Inst[31:20]

1

0

clk

ImmSel=I

mem

wb

pc

36

Page 37: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

AllRV32LoadInstructions

• Supportingthenarrowerloadsrequiresadditionallogictoextractthecorrectbyte/halfword fromthevalueloadedfrommemory,andsign- orzero-extendtheresultto32bitsbeforewritingbacktoregisterfile.– Itisjustamuxmod

funct3 field encodes size and ‘signedness’ of load data

imm[11:0] rs1 000 rd 0000011

imm[11:0] rs1 001 rd 0000011

imm[11:0] rs1 010 rd 0000011

lblhlw

imm[11:0] rs1 100 rd 0000011 lbuimm[11:0] rs1 101 rd 0000011 lhu

37

Page 38: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Addingsw Instruction• sw:Readstworegisters,rs1forbasememoryaddress,andrs2fordatatobestored,aswellimmediateoffset!sw x14, 8(x2)

0000000 01110 00010 010 01000 0100011

combined 12-bit offset = 80000000 01000

7 5 5 3 75

31 25 20 15 71224 19 14 11 6 0Imm[11:5] rs2 rs1 funct3 imm[4:0] opcode

offset[11:5] base widthsrc STOREoffset[4:0]

STOREoffset[4:0]=8

SWoffset[11:5]=0

rs2=14 rs1=2

38

Page 39: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Datapath withlw

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[24:20] ALU+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

aluReg[rs1]

Reg[rs2]

Inst[31:0]

Control logic

RegWEn ALUSelBSel MemRW

0

1

Imm[31:0]Imm.Gen

+4Add

clk

addrinst

IMEM DMEM

addrDataR

PCpc+4

Inst[31:20]

1

0

clk

WBSelImmSel

mem

wb

pc

39

Page 40: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Addingsw toDatapath

+4Add

clk

addrinst

IMEM

PCpc+4

Inst[24:20] ALU+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

aluReg[rs1]

Reg[rs2]

Inst[31:0]

Control logic

RegWEn=0

ALUSel=Add

Bsel=1

MemRW=Write

0

1

Imm[31:0]Imm.Gen

+4Add

clk

addrinst

IMEM DMEM

addrDataR

DataW

PCpc+4

Inst[31:7]

1

0

clk

WBSel=*(*=Don’t care)

ImmSel=S

mempc

+4Add

clk

addrinst

IMEM DMEM

addrDataR

DataW

PCpc+4

wb

40

Page 41: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

I+SImmediateGeneration

inst[31:0]

SI

1 65 5

• Just need a 5-bit mux to select between two positions where low five bits of immediate can reside in instruction

• Other bits in immediate are wired to fixed positions in instruction

imm[11:5] rs2 rs1 funct3 imm[4:0] S-opcode

25 2431 20 15 71219 14 11 6 0rs1 funct3 rd I-opcodeimm[11:0]

SI

inst[24:20]inst[31] (sign extension) Iinst[30:25]inst[11:7]inst[31] (sign extension) inst[30:25] S

31 511 10 4 0

I/S

imm[31:0]

41

Page 42: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

ImplementingBranches

• B-formatismostlysameasS-Format,withtworegistersources(rs1/rs2)anda12-bitimmediate

• Butnowimmediaterepresentsvalues-4096to+4094in2-byteincrements

• The12immediatebitsencodeeven 13-bitsignedbyteoffsets(lowestbitofoffsetisalwayszero,sononeedtostoreit)

1 6 5 3 74

31 30 24 15 71225 20 14 11 6 0imm[12] rs2 rs1 funct3 imm[4:1] opcodeimm[10:5] imm[11]

19 8

5 1

BRANCHoffset[12|10:5] rs1 funct3rs2 offset[4:1|11]

42

Page 43: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Datapath SoFar

+4Add

addrinst

IMEM

pc+4Inst[24:20] ALU

+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

alu

Reg[rs1]

Reg[rs2]

Inst[31:0]

Control logic

RegWEn ALUSelBsel MemRW

0

1

Imm[31:0]Imm.Gen

Add

clk

addrinst

IMEM DMEM

addrDataR

DataW

PC

Inst[31:7]

1

0

clk

WBSelImmSel

mem

wb

pc

43

Page 44: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

Branches

• Differentchangetothestate:

– PC =

• Sixbranchinstructions:BEQ, BNE, BLT, BGE, BLTU, BGEU

• NeedtocomputePC + immediate andtocomparevaluesofrs1 and rs2– ButhaveonlyoneALU– needmorehardware

PC + 4, branch not takenPC + immediate, branch taken

44

Page 45: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

AddingBranches

+4Add

addrinst

IMEM

pc+4Inst[24:20] ALU

+

clk

Reg [ ]

Inst[19:15]

Inst[11:7]

AddrB

AddrA DataA

DataB

AddrD

DataD

alu

Reg[rs1]

Reg[rs2]

Inst[31:0]

Control logic

RegWEn=0

ALUSel=Add

Bsel=1

MemRW=ReadAsel

=1

0

1

Imm[31:0]Imm.Gen

Add

clk

addrinst

IMEM DMEM

addrDataR

DataW

PC

Inst[31:7]

1

0

clk

WBSel=*(*=Don’t care)

BranchComp

1

0

ImmSel=B

1

0

PCSel=taken/not taken

BrUnBrEq

BrLT

mem

wb

pc

45

Page 46: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

BranchComparator

• BrEq =1,ifA=B

• BrLT =1,ifA<B

• BrUn =1selectsunsignedcomparisonforBrLT,0=signed

• BGEbranch:A>=B,ifA<B

A<B=!(A<B)

BranchComp

A

B

BrU BrLTBrEq

46

Page 47: CS 110 Computer Architecture Lecture 10 · 4. One-Instruction-Per-Cycle RISC-V Machine ... combinational logic outputs – execution moves to next clock cycle Regist ers PC Instr.

BranchImmediates (InOtherISAs)• 12-bitimmediateencodesPC-relativeoffsetof-4096to+4094bytes

inmultiplesof2bytes• Standardapproach:Treatimmediateasinrange-2048..+2047,then

shiftleftby1bittomultiplyby2forbranches

s rs2 rs1 funct3 imm[4:0] B-opcodeimm[10:5]

s imm[10:5] imm[4:0]

s imm[10:5] imm[4:0] 0

sign-extension

sign-extension

S-Immediate

B-Immediate(shiftleftby1)

Each instruction immediate bit can appear in one of two places in output immediate value – so need one 2-way mux per bit

47


Recommended