ComputerArchitectureELE475/COS475
SlideDeck1:IntroductionandInstructionSetArchitectures
DavidWentzlaffDepartmentofElectricalEngineering
PrincetonUniversity
1
WhatisComputerArchitecture?Application
2
WhatisComputerArchitecture?
Physics
Application
3
WhatisComputerArchitecture?
Physics
Application
Gaptoolargetobridgeinonestep
4
WhatisComputerArchitecture?
Initsbroadestdefinition,computerarchitectureisthedesignoftheabstraction/implementationlayersthatallowustoexecuteinformationprocessingapplicationsefficientlyusingmanufacturingtechnologies
Physics
Application
Gaptoolargetobridgeinonestep
5
WhatisComputerArchitecture?
Initsbroadestdefinition,computerarchitectureisthedesignoftheabstraction/implementationlayersthatallowustoexecuteinformationprocessingapplicationsefficientlyusingmanufacturingtechnologies
Physics
Application
Gaptoolargetobridgeinonestep
6
AbstractionsinModernComputingSystems
Physics
Devices
CircuitsGates
Register-TransferLevelMicroarchitecture
InstructionSetArchitecture
OperatingSystem/VirtualMachines
ProgrammingLanguage
Algorithm
Application
7
AbstractionsinModernComputingSystems
Physics
Devices
CircuitsGates
Register-TransferLevelMicroarchitecture
InstructionSetArchitecture
OperatingSystem/VirtualMachines
ProgrammingLanguage
Algorithm
Application
ComputerArchitecture(ELE475)
8
ComputerArchitectureisConstantlyChanging
Physics
Devices
CircuitsGates
Register-TransferLevelMicroarchitecture
InstructionSetArchitecture
OperatingSystem/VirtualMachines
ProgrammingLanguage
Algorithm
Application ApplicationRequirements:• Suggesthowtoimprovearchitecture• Providerevenuetofunddevelopment
TechnologyConstraints:• Restrictwhatcanbedoneefficiently• Newtechnologiesmakenewarch
possible
9
ComputerArchitectureisConstantlyChanging
Physics
Devices
CircuitsGates
Register-TransferLevelMicroarchitecture
InstructionSetArchitecture
OperatingSystem/VirtualMachines
ProgrammingLanguage
Algorithm
Application ApplicationRequirements:• Suggesthowtoimprovearchitecture• Providerevenuetofunddevelopment
TechnologyConstraints:• Restrictwhatcanbedoneefficiently• Newtechnologiesmakenewarch
possible
Architectureprovidesfeedbacktoguideapplicationandtechnologyresearchdirections
10
ComputersThen…
IASMachine.DesigndirectedbyJohnvonNeumann.FirstbootedinPrincetonNJin1952SmithsonianInstitutionArchives(SmithsonianImage95-06151) 11
ComputersNow
12
Robots
SupercomputersAutomobiles
Laptops
Set-topboxes
Smartphones
ServersMediaPlayers
SensorNets
Routers
CamerasGames
AudioAssistants
[fromKurzweil]
MajorTechnologyGenerations Bipolar
nMOS
CMOS
pMOS
Relays
VacuumTubes
Electromechanical
13
SequentialProcessorPerformance
14FromHennessyandPattersonEd.5ImageCopyright©2011,ElsevierInc.AllrightsReserved.
SequentialProcessorPerformance
RISC
15FromHennessyandPattersonEd.5ImageCopyright©2011,ElsevierInc.AllrightsReserved.
SequentialProcessorPerformance
RISC
Move to multi-processor
16FromHennessyandPattersonEd.5ImageCopyright©2011,ElsevierInc.AllrightsReserved.
CourseAdministrationInstructor: Prof.DavidWentzlaff([email protected]) Office:EQuadB228
OfficeHours:Mon.1:30-2:30pmEQuadB228TA: AngLi([email protected])
OfficeHours:TBDEQuadF210Lectures: Monday&Wednesday11am-12:20pmEQuadB205Text: ComputerArchitecture:AQuantitativeApproach
HennesseyandPatterson,5thEdition(2012) ModernProcessorDesign:FundamentalsofSuperscalar Processors(2004) JohnP.ShenandMikkoH.Lipasti
Prerequisite: ELE375&ELE206CourseWebpage:https://eleclass.princeton.edu/classes/ele475/spring_2018/ 17
CourseStructure
• Midterm(20%)• FinalExam(30%)• Labs(25%)
– 4Designlabs(Verilog)• DesignProject(20%)
– UsingOpenPiton– Insmallgroups
• ClassParticipation(5%)• UngradedProblemSets(5PS’s)(0%)
– Veryusefulforexampreparation18
CourseContentComputerOrganization(COS/ELE375)
ComputerOrganization• BasicPipelinedProcessor
~50,000Transistors
PhotoofBerkeleyRISCI,©UniversityofCalifornia(Berkeley)19
CourseContentComputerArchitecture(ELE475)
IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg
20
CourseContentComputerArchitecture(ELE475)
IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg
~700,000,000Transistors21
CourseContentComputerArchitecture(ELE475)
IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg
~700,000,000Transistors
ComputerOrganization(ELE375)Processor
22
CourseContentComputerArchitecture(ELE475)
IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg
~700,000,000Transistors
ComputerOrganization(ELE375)Processor
• InstructionLevelParallelism– Superscalar– VeryLongInstructionWord(VLIW)
• LongPipelines(PipelineParallelism)
• AdvancedMemoryandCaches• DataLevelParallelism
– Vector– GPU
• ThreadLevelParallelism– Multithreading– Multiprocessor– Multicore– Manycore
23
Architecturevs.Microarchitecture
“Architecture”/InstructionSetArchitecture:• Programmervisiblestate(Memory&Register)• Operations(Instructionsandhowtheywork)• ExecutionSemantics(interrupts)• Input/Output• DataTypes/SizesMicroarchitecture/Organization:• TradeoffsonhowtoimplementISAforsomemetric(Speed,Energy,Cost)
• Examples:Pipelinedepth,numberofpipelines,cachesize,siliconarea,peakpower,executionordering,buswidths,ALUwidths
24
SoftwareDevelopments
25
upto1955 Librariesofnumericalroutines-Floatingpointoperations-Transcendentalfunctions-Matrixmanipulation,equationsolvers,...
1955-60 HighlevelLanguages-Fortran1956OperatingSystems--Assemblers,Loaders,Linkers,Compilers-Accountingprogramstokeeptrackofusageandcharges
SoftwareDevelopments
26
upto1955 Librariesofnumericalroutines-Floatingpointoperations-Transcendentalfunctions-Matrixmanipulation,equationsolvers,...
1955-60 HighlevelLanguages-Fortran1956OperatingSystems--Assemblers,Loaders,Linkers,Compilers-Accountingprogramstokeeptrackofusageandcharges
Machinesrequiredexperiencedoperators• Mostuserscouldnotbeexpectedtounderstandtheseprograms,muchlesswritethem• Machineshadtobesoldwithalotofresidentsoftware
CompatibilityProblematIBM
27
Byearly1960’s,IBMhad4incompatiblelinesofcomputers!
701 ⇒ 7094650 ⇒ 7074702 ⇒ 70801401 ⇒ 7010
Eachsystemhaditsown• Instructionset• I/OsystemandSecondaryStorage: magnetictapes,drumsanddisks• assemblers,compilers,libraries,...• marketnichebusiness,scientific,realtime,...
CompatibilityProblematIBM
28
Byearly1960’s,IBMhad4incompatiblelinesofcomputers!
701 ⇒ 7094650 ⇒ 7074702 ⇒ 70801401 ⇒ 7010
Eachsystemhaditsown• Instructionset• I/OsystemandSecondaryStorage: magnetictapes,drumsanddisks• assemblers,compilers,libraries,...• marketnichebusiness,scientific,realtime,...
CompatibilityProblematIBM
29
Byearly1960’s,IBMhad4incompatiblelinesofcomputers!
701 ⇒ 7094650 ⇒ 7074702 ⇒ 70801401 ⇒ 7010
Eachsystemhaditsown• Instructionset• I/OsystemandSecondaryStorage: magnetictapes,drumsanddisks• assemblers,compilers,libraries,...• marketnichebusiness,scientific,realtime,...
⇒ IBM360
30
IBM360:AGeneral-PurposeRegister(GPR)Machine
• ProcessorState– 16General-Purpose32-bitRegisters
• maybeusedasindexandbaseregister• Register0hassomespecialproperties
– 4FloatingPoint64-bitRegisters– AProgramStatusWord(PSW)
• PC,Conditioncodes,Controlflags
• A32-bitmachinewith24-bitaddresses– Butnoinstructioncontainsa24-bitaddress!
• DataFormats– 8-bitbytes,16-bithalf-words,32-bitwords,64-bitdouble-words
31
IBM360:AGeneral-PurposeRegister(GPR)Machine
• ProcessorState– 16General-Purpose32-bitRegisters
• maybeusedasindexandbaseregister• Register0hassomespecialproperties
– 4FloatingPoint64-bitRegisters– AProgramStatusWord(PSW)
• PC,Conditioncodes,Controlflags
• A32-bitmachinewith24-bitaddresses– Butnoinstructioncontainsa24-bitaddress!
• DataFormats– 8-bitbytes,16-bithalf-words,32-bitwords,64-bitdouble-words
TheIBM360iswhybytesare8-bitslongtoday!
32
IBM360:InitialImplementations Model 30 . . . Model 70
Storage 8K - 64 KB 256K - 512 KB Datapath 8-bit 64-bit Circuit Delay 30 nsec/level 5 nsec/level Local Store Main Store Transistor Registers Control Store Read only 1µsec Conventional circuits
IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models. Milestone: The first true ISA designed as portable hardware-software interface!
33
IBM360:InitialImplementations Model 30 . . . Model 70
Storage 8K - 64 KB 256K - 512 KB Datapath 8-bit 64-bit Circuit Delay 30 nsec/level 5 nsec/level Local Store Main Store Transistor Registers Control Store Read only 1µsec Conventional circuits
IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models. Milestone: The first true ISA designed as portable hardware-software interface!
With minor modifications it still survives today!
IBM360:Over50yearslater…ThezSeriesz14Microprocessor
• 5.2GHzinIBM14nmSOICMOStechnology• 6.1billiontransistorsin696mm2• 64-bitvirtualaddressing
– originalS/360was24-bit,andS/370was31-bitextension• 10-coredesign• 6-fetch/cycle• 10-issue/cycleout-of-ordersuperscalarpipeline• Out-of-ordermemoryaccesses• Redundantdatapaths
– everyinstructionperformedintwoparalleldatapathsandresultscompared
• 128KBL1I-cache,128KBL1D-cacheon-chip• 2MBprivateI-cacheL2percore• 4MBprivateD-cacheL2percore• On-Chip128MBeDRAML3cache• Upto672MBeDRAML4
ImageCredit:IBMCourtesyofInternationalBusiness
MachinesCorporation,©InternationalBusinessMachinesCorporation. 34
SameArchitectureDifferentMicroarchitecture
AMDPhenomX4• X86InstructionSet• QuadCore• 125W• Decode3Instructions/Cycle/Core• 64KBL1ICache,64KBL1DCache• 512KBL2Cache• Out-of-order• 2.6GHz
IntelAtom• X86InstructionSet• SingleCore• 2W• Decode2Instructions/Cycle/Core• 32KBL1ICache,24KBL1DCache• 512KBL2Cache• In-order• 1.6GHz
ImageCredit:Intel
ImageCredit:AMD35
DifferentArchitectureDifferentMicroarchitecture
AMDPhenomX4• X86InstructionSet• QuadCore• 125W• Decode3Instructions/Cycle/Core• 64KBL1ICache,64KBL1DCache• 512KBL2Cache• Out-of-order• 2.6GHz
IBMPOWER7• PowerInstructionSet• EightCore• 200W• Decode6Instructions/Cycle/Core• 32KBL1ICache,32KBL1DCache• 256KBL2Cache• Out-of-order• 4.25GHz
ImageCredit:IBMImageCredit:AMD
CourtesyofInternationalBusinessMachinesCorporation,©InternationalBusinessMachinesCorporation.
36
WhereDoOperandsComefromAndWhereDoResultsGo?
37
WhereDoOperandsComefromAndWhereDoResultsGo?
38
ALU
WhereDoOperandsComefromAndWhereDoResultsGo?
39
ALU
…
Mem
ory
Processor
…
WhereDoOperandsComefromAndWhereDoResultsGo?
40
ALU
…
Mem
ory
WhereDoOperandsComefromAndWhereDoResultsGo?
41
WhereDoOperandsComefromAndWhereDoResultsGo?
42
…TOS
ALU
Processor
…
Mem
ory
Stack
WhereDoOperandsComefromAndWhereDoResultsGo?
43
…TOS
ALU
Processor
…
Mem
ory
ALU
Processor
…
Mem
ory
Stack Accumulator
WhereDoOperandsComefromAndWhereDoResultsGo?
44
…TOS
ALU
Processor
…
Mem
ory
ALU
Processor
…
Mem
ory
ALU
Processor
…
Mem
ory
…Stack Accumulator
Register-Memory
WhereDoOperandsComefromAndWhereDoResultsGo?
45
…TOS
ALU
Processor
…
Mem
ory
ALU
Processor
…
Mem
ory
ALU
Processor
…
Mem
ory
…Stack Accumulator
Register-Memory
Register-Register
ALU
Processor
…
Mem
ory
…
WhereDoOperandsComefromAndWhereDoResultsGo?
46
…TOS
ALU
Processor
…
Mem
ory
ALU
Processor
…
Mem
ory
ALU
Processor
…
Mem
ory
…Stack Accumulator
Register-Memory
Register-Register
0 1 2or3NumberExplicitlyNamedOperands:
ALU
Processor
…
Mem
ory
…
2or3
Stack-BasedInstructionSetArchitecture(ISA)
• Burrough’sB5000(1960)• Burrough’sB6700• HP3000• ICL2900• Symbolics3600• InmosTransputerModern• Forthmachines• JavaVirtualMachine• Intelx87FloatingPointUnit
47
…TOS
ALU
Processor
…
Mem
ory
EvaluationofExpressions
48
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
EvaluationofExpressions
49
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
EvaluationofExpressions
50
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack
EvaluationofExpressions
51
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack push a
a
EvaluationofExpressions
52
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
b
EvaluationofExpressions
53
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
push b
b
EvaluationofExpressions
54
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
c b
EvaluationofExpressions
55
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
push c
c b
EvaluationofExpressions
56
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
c b
EvaluationofExpressions
57
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
multiply
* b * c
c b
EvaluationofExpressions
58
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
Evaluation Stack a
b * c
EvaluationofExpressions
59
a
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
add Evaluation Stack
b * c
EvaluationofExpressions
60
a
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
add
+
Evaluation Stack
b * c
EvaluationofExpressions
61
a
(a + b * c) / (a + d * c - e) /
+
* + a e
-
a c
d c
* b
Reverse Polish a b c * + a d c * + e - /
add
+
Evaluation Stack
b * c a + b * c
Hardwareorganizationofthestack
• Stackispartoftheprocessorstate⇒ stackmustbeboundedandsmall≈ numberofRegisters,notthesizeofmainmemory
• Conceptuallystackisunbounded
⇒apartofthestackisincludedintheprocessorstate;therestiskeptinthemainmemory
62
StackOperationsandImplicitMemoryReferences
• Supposethetop2elementsofthestackarekeptinregistersandtherestiskeptinthememory.
Eachpushoperation⇒ 1memoryreferencepopoperation ⇒ 1memoryreference
NoGood!• BetterperformancebykeepingthetopNelementsinregisters,andmemoryreferencesaremadeonlywhenregisterstackoverflowsorunderflows.
Issue-whentoLoad/Unloadregisters?
63
StackOperationsandImplicitMemoryReferences
• Supposethetop2elementsofthestackarekeptinregistersandtherestiskeptinthememory.
Eachpushoperation⇒ 1memoryreferencepopoperation ⇒ 1memoryreference
NoGood!• BetterperformancebykeepingthetopNelementsinregisters,andmemoryreferencesaremadeonlywhenregisterstackoverflowsorunderflows.
Issue-whentoLoad/Unloadregisters?
64
StackOperationsandImplicitMemoryReferences
• Supposethetop2elementsofthestackarekeptinregistersandtherestiskeptinthememory.
Eachpushoperation⇒ 1memoryreferencepopoperation ⇒ 1memoryreference
NoGood!• BetterperformancebykeepingthetopNelementsinregisters,andmemoryreferencesaremadeonlywhenregisterstackoverflowsorunderflows.
Issue-whentoLoad/Unloadregisters?
65
StackSizeandMemoryReferences
66
program stack (size = 2) memory refs push a R0 a push b R0 R1 b push c R0 R1 R2 c, ss(a) * R0 R1 sf(a) + R0 push a R0 R1 a push d R0 R1 R2 d, ss(a+b*c) push c R0 R1 R2 R3 c, ss(a) * R0 R1 R2 sf(a) + R0 R1 sf(a+b*c) push e R0 R1 R2 e,ss(a+b*c) - R0 R1 sf(a+b*c) / R0
a b c * + a d c * + e - /
StackSizeandMemoryReferences
67
program stack (size = 2) memory refs push a R0 a push b R0 R1 b push c R0 R1 R2 c, ss(a) * R0 R1 sf(a) + R0 push a R0 R1 a push d R0 R1 R2 d, ss(a+b*c) push c R0 R1 R2 R3 c, ss(a) * R0 R1 R2 sf(a) + R0 R1 sf(a+b*c) push e R0 R1 R2 e,ss(a+b*c) - R0 R1 sf(a+b*c) / R0
a b c * + a d c * + e - /
4 stores, 4 fetches (implicit)
StackSizeandExpressionEvaluation
68
program stack (size = 4) push a R0 push b R0 R1 push c R0 R1 R2 * R0 R1 + R0 push a R0 R1 push d R0 R1 R2 push c R0 R1 R2 R3 * R0 R1 R2 + R0 R1 push e R0 R1 R2 - R0 R1 / R0
a b c * + a d c * + e - /
StackSizeandExpressionEvaluation
69
program stack (size = 4) push a R0 push b R0 R1 push c R0 R1 R2 * R0 R1 + R0 push a R0 R1 push d R0 R1 R2 push c R0 R1 R2 R3 * R0 R1 R2 + R0 R1 push e R0 R1 R2 - R0 R1 / R0
a b c * + a d c * + e - /
a and c are “loaded” twice ⇒not the best use of registers!
MachineModelSummary
70
…TOS
ALU
Processor
…
Mem
ory
ALU
Processor
…Mem
ory
ALU
Processor
…
Mem
ory
…Stack Accumulator
Register-Memory
Register-Register
ALU
Processor
…
Mem
ory
…
MachineModelSummary
71
…TOS
ALU
Processor
…
Mem
ory
ALU
Processor
…Mem
ory
ALU
Processor
…
Mem
ory
…Stack Accumulator
Register-Memory
Register-Register
ALU
Processor
…
Mem
ory
…
C=A+B
MachineModelSummary
72
…TOS
ALU
Processor
…
Mem
ory
ALU
Processor
…Mem
ory
ALU
Processor
…
Mem
ory
…Stack Accumulator
Register-Memory
Register-Register
ALU
Processor
…
Mem
ory
…
C=A+B
PushAPushBAddPopC
LoadAAddBStoreC
LoadR1,AAddR3,R1,BStoreR3,C
LoadR1,ALoadR2,BAddR3,R1,R2StoreR3,C
ClassesofInstructions• DataTransfer
– LD,ST,MFC1,MTC1,MFC0,MTC0• ALU
– ADD,SUB,AND,OR,XOR,MUL,DIV,SLT,LUI• ControlFlow
– BEQZ,JR,JAL,TRAP,ERET• FloatingPoint
– ADD.D,SUB.S,MUL.D,C.LT.D,CVT.S.W,• Multimedia(SIMD)
– ADD.PS,SUB.PS,MUL.PS,C.LT.PS• String
– REPMOVSB(x86)73
AddressingModes:HowtoGetOperandsfromMemory
74
AddressingMode
Instruction Function
Register AddR4,R3,R2 Regs[R4]<-Regs[R3]+Regs[R2]**
Immediate AddR4,R3,#5 Regs[R4]<-Regs[R3]+5**
Displacement AddR4,R3,100(R1) Regs[R4]<-Regs[R3]+Mem[100+Regs[R1]]
RegisterIndirect
AddR4,R3,(R1) Regs[R4]<-Regs[R3]+Mem[Regs[R1]]
Absolute AddR4,R3,(0x475) Regs[R4]<-Regs[R3]+Mem[0x475]
MemoryIndirect
AddR4,R3,@(R1) Regs[R4]<-Regs[R3]+Mem[Mem[R1]]
PCrelative AddR4,R3,100(PC) Regs[R4]<-Regs[R3]+Mem[100+PC]
Scaled AddR4,R3,100(R1)[R5] Regs[R4]<-Regs[R3]+Mem[100+Regs[R1]+Regs[R5]*4]
**Maynotactuallyaccessmemory!
DataTypesandSizes• Types
– BinaryInteger– BinaryCodedDecimal(BCD)– FloatingPoint
• IEEE754• CrayFloatingPoint• IntelExtendedPrecision(80-bit)
– PackedVectorData– Addresses
• Width– BinaryInteger(8-bit,16-bit,32-bit,64-bit)– FloatingPoint(32-bit,40-bit,64-bit,80-bit)– Addresses(16-bit,24-bit,32-bit,48-bit,64-bit)
75
ISAEncodingFixedWidth:EveryInstructionhassamewidth• Easytodecode(RISCArchitectures:MIPS,PowerPC,SPARC,ARM…)Ex:MIPS,everyinstruction4-bytesVariableLength:Instructionscanvaryinwidth• Takeslessspaceinmemoryandcaches(CISCArchitectures:IBM360,x86,Motorola68k,VAX…)Ex:x86,instructions1-byteupto17-bytesMostlyFixedorCompressed:• Ex:MIPS16,THUMB(onlytwoformats2and4bytes)• PowerPCandsomeVLIWs(Storeinstructionscompressed,
decompressintoInstructionCache(Very)LongInstructionWord:• Multipleinstructionsinafixedwidthbundle• Ex:Multiflow,HP/STLx,TIC6000 76
ISAEncodingFixedWidth:EveryInstructionhassamewidth• Easytodecode(RISCArchitectures:MIPS,PowerPC,SPARC,ARM…)Ex:MIPS,everyinstruction4-bytesVariableLength:Instructionscanvaryinwidth• Takeslessspaceinmemoryandcaches(CISCArchitectures:IBM360,x86,Motorola68k,VAX…)Ex:x86,instructions1-byteupto17-bytesMostlyFixedorCompressed:• Ex:MIPS16,THUMB(onlytwoformats2and4bytes)• PowerPCandsomeVLIWs(Storeinstructionscompressed,
decompressintoInstructionCache(Very)LongInstructionWord:• Multipleinstructionsinafixedwidthbundle• Ex:Multiflow,HP/STLx,TIC6000 77
x86(IA-32)InstructionEncoding
78
ImmediateDisplacementScale,Index,BaseModR/MOpcodeInstruction
Prefixes
0,1,2,or4bytes
0,1,2,or4bytes
1byte(ifneeded)
1byte(ifneeded)
1,2,or3bytes
UptofourPrefixes(1byteeach)
x86andx86-64InstructionFormatsPossibleinstructions1to18byteslong
MIPS64InstructionEncoding
79ImageCopyright©2011,ElsevierInc.AllrightsReserved.
RealWorldInstructionSets
80
Arch Type #Oper #Mem DataSize #Regs AddrSize Use
Alpha Reg-Reg 3 0 64-bit 32 64-bit Workstation
ARM Reg-Reg 3 0 32/64-bit 16 32/64-bit CellPhones,Embedded
MIPS Reg-Reg 3 0 32/64-bit 32 32/64-bit Workstation,Embedded
SPARC Reg-Reg 3 0 32/64-bit 24-32 32/64-bit Workstation
TIC6000 Reg-Reg 3 0 32-bit 32 32-bit DSP
IBM360 Reg-Mem 2 1 32-bit 16 24/31/64 Mainframe
x86 Reg-Mem 2 1 8/16/32/64-bit
4/8/24 16/32/64 PersonalComputers
VAX Mem-Mem 3 3 32-bit 16 32-bit Minicomputer
Mot.6800 Accum. 1 1/2 8-bit 0 16-bit Microcontroler
WhytheDiversityinISAs?
TechnologyInfluencedISA• Storageisexpensive,tightencodingimportant• ReducedInstructionSetComputer
– Removeinstructionsuntilwholecomputerfitsondie• Multicore/Manycore
– TransistorsnotturningintosequentialperformanceApplicationInfluencedISA• InstructionsforApplications
– DSPinstructions• CompilerTechnologyhasimproved
– SPARCRegisterWindowsnolongerneeded– Compilercanregisterallocateeffectively
81
Recap
82
Physics
Devices
CircuitsGates
Register-TransferLevelMicroarchitecture
InstructionSetArchitecture
OperatingSystem/VirtualMachines
ProgrammingLanguage
Algorithm
Application
ComputerArchitecture(ELE475)
Recap
• ISAvsMicroarchitecture• ISACharacteristics
– MachineModels– Encoding– DataTypes– Instructions– AddressingModes
83
Physics
Devices
CircuitsGates
Register-TransferLevelMicroarchitecture
InstructionSetArchitecture
OperatingSystem/VirtualMachines
ProgrammingLanguage
Algorithm
Application
ComputerArchitectureLecture1
NextClass:MicrocodeandReviewofPipelining
84
Acknowledgements• Theseslidescontainmaterialdevelopedandcopyrightby:
– Arvind(MIT)– KrsteAsanovic(MIT/UCB)– JoelEmer(Intel/MIT)– JamesHoe(CMU)– JohnKubiatowicz(UCB)– DavidPatterson(UCB)– ChristopherBatten(Cornell)
• MITmaterialderivedfromcourse6.823• UCBmaterialderivedfromcourseCS252&CS152• CornellmaterialderivedfromcourseECE4750
85
Copyright©2018DavidWentzlaff
86