+ All Categories
Home > Documents > Adding custom instructions to Simplescalar/GCC architecture Somasundaram.

Adding custom instructions to Simplescalar/GCC architecture Somasundaram.

Date post: 14-Dec-2015
Category:
Upload: julissa-drews
View: 222 times
Download: 0 times
Share this document with a friend
21
Adding custom Adding custom instructions to instructions to Simplescalar/GCC Simplescalar/GCC architecture architecture Somasundaram Somasundaram
Transcript

Adding custom instructions to Adding custom instructions to Simplescalar/GCC architectureSimplescalar/GCC architecture

SomasundaramSomasundaram

AgendaAgenda

MotivationMotivation GCC overall architectureGCC overall architecture Simplescalar architectureSimplescalar architecture Adding a custom instructionAdding a custom instruction ConclusionConclusion

MotivationMotivation GCC overall architectureGCC overall architecture Simplescalar architectureSimplescalar architecture Adding a custom instructionAdding a custom instruction ConclusionConclusion

MotivationMotivation

Extensible processorsExtensible processors• What regular ISA instructions can be What regular ISA instructions can be

combined?combined?

• Which regular ISA instructions are to be Which regular ISA instructions are to be combined into a CFU instruction?combined into a CFU instruction?

• Retarget the compiler to produce Retarget the compiler to produce optimised code with CFU instructionsoptimised code with CFU instructions

• Simulate the Simulate the extendedextended processor with processor with CFU instructionsCFU instructions

GNU Compiler CollectionGNU Compiler Collection

Many front-endsMany front-ends• CC• FortranFortran• C++/Java/AdaC++/Java/Ada

Backend targeted at many Backend targeted at many processorsprocessors• x86, Alpha, Sparcx86, Alpha, Sparc• ARC, ARM, MIPS . . .ARC, ARM, MIPS . . .

GCC Compiler FlowGCC Compiler Flow

Language Front-end

High-level IR Optimisations

Low-level IR Optimisations

Program

GIMPLE IR

RTL IR

Scheduled assembly code

Machine dependent files [.c, .h,.md]

RTL?

Are we interested in everything?

Combine small RISC ISA like patterns into bigger CISC ISA

like patterns

GCC – Low Level OptimisationGCC – Low Level Optimisation Uses Lisp like RTL as IRUses Lisp like RTL as IR

Example: Example: Tip: use –da compiler option to get the IR outputTip: use –da compiler option to get the IR output

(insn 48 47 50 (set (reg/v:SI 36)(insn 48 47 50 (set (reg/v:SI 36) (mult:SI (reg:SI 42)(mult:SI (reg:SI 42) (reg:SI 41))) 41 {mulsi3} (nil)(reg:SI 41))) 41 {mulsi3} (nil) (nil))(nil))

(call_insn 94 93 97 (parallel[ (call_insn 94 93 97 (parallel[ (set (reg:SI 0 r0)(set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI ("printf")) 0)(call (mem:SI (symbol_ref:SI ("printf")) 0) (const_int 0 [0x0])))(const_int 0 [0x0]))) (clobber (reg:SI 14 lr))(clobber (reg:SI 14 lr)) ] ) -1 (nil)] ) -1 (nil) (nil)(nil) (expr_list (use (reg:SI 1 r1))(expr_list (use (reg:SI 1 r1)) (expr_list (use (reg:SI 0 r0))(expr_list (use (reg:SI 0 r0)) (nil))))(nil))))

GCC - Target Machine DescriptionGCC - Target Machine Description

Use a similar language in Use a similar language in mdmd [machine [machine description] filedescription] file

(define_insn "mulsi3"(define_insn "mulsi3" [(set (match_operand:SI 0 "s_register_operand" "=&r,&r")[(set (match_operand:SI 0 "s_register_operand" "=&r,&r")

(mult:SI (match_operand:SI 2 "s_register_operand" "r,r")(mult:SI (match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 1 "s_register_operand" "%?r,0")))](match_operand:SI 1 "s_register_operand" "%?r,0")))]

"""" "mul%?\\t%0, %2, %1""mul%?\\t%0, %2, %1"[(set_attr "type" "mult")])[(set_attr "type" "mult")])

GCC Combine PhaseGCC Combine Phase

Combines some standard IR pattern Combines some standard IR pattern into a single user-defined IR patterninto a single user-defined IR pattern

User-defined IR patterns are defined User-defined IR patterns are defined in the target.md filein the target.md file

Operand constraints should be Operand constraints should be satisfiedsatisfied

Example: MAC (Multiply-Accumulate)Example: MAC (Multiply-Accumulate) MergeMerge mulsi3mulsi3 andand addsi3addsi3 mulsi3addsimulsi3addsi

GCC Combine PhaseGCC Combine Phase

How is it done?How is it done?

Let us assume that the following patterns Let us assume that the following patterns are defined in the machine descriptionare defined in the machine description

addsi3 addsi3 Matches C=A+B (all 32-bit regs) Matches C=A+B (all 32-bit regs)

mulsi3 mulsi3 Matches C=A*B (all 32-bit regs) Matches C=A*B (all 32-bit regs)

mulsi3addsi mulsi3addsi Matches D=A*B+C (all 32-bit regs) Matches D=A*B+C (all 32-bit regs)

mulsi4addsi mulsi4addsi Matches E=A*B+C*D (all 32-bit regs) Matches E=A*B+C*D (all 32-bit regs)

GCC Combine PhaseGCC Combine Phase

47 45

48

55

53

52 50

addsi3

mulsi3

mem mem mem mem

mulsi3

Assume this DDG sub-graph

GCC Combine PhaseGCC Combine Phase47 45

48

55

53

52 50

addsi3

mulsi3

mem mem mem mem

mulsi3

Try to combine 48, 55 and see if a pattern which multiplies two

operands and adds a third operand to the result exists

47 45

55

53

52 50

mulsi3addsi

mulsi3

mem mem mem memTry 55,45:No matching pattern

Try 55,47:No matching pattern

Try 55,53:We have a match

GCC Combine phaseGCC Combine phase

Try 55,52:No matching pattern

Try 55,50:No matching pattern

47 45

55

52 50

mulsi4addsi

mem mem mem mem

Try 55,45:No matching pattern

Try 55,47:No matching pattern

Try 55,52,50: No matching pattern

Try 55,52,45: No matching pattern

Try 55,52,47: No matching pattern

Try 55,50,45: No matching pattern

Try 55,50,47: No matching pattern

Try 55,47,45: No matching pattern

Cannot try to combine more than 3 patterns! Hence, stop!

GCC Combine phase: SummaryGCC Combine phase: Summary

Can combine upto 3 instructions Can combine upto 3 instructions togethertogether

Can recursively combine more Can recursively combine more instructionsinstructions

Deletes a smaller instruction once Deletes a smaller instruction once combinedcombined

Always works on a functionAlways works on a function

Retargetting GCC for CFURetargetting GCC for CFU

Build a better Combiner phaseBuild a better Combiner phase• Write a new combiner with better Write a new combiner with better

pattern merger which works on inputs pattern merger which works on inputs from RTLfrom RTL

• Replace existing combiner with this Replace existing combiner with this combinercombiner

New patterns for the CFU instruction New patterns for the CFU instruction in the target.md filein the target.md file

Changes in GAS (included in binutils Changes in GAS (included in binutils package) to generate insn. wordpackage) to generate insn. word

SimpleScalar isSimpleScalar is

Instruction Set simulatorInstruction Set simulator Profiles programs Profiles programs Simulates micro-architectural Simulates micro-architectural

featuresfeatures Different levels of speed of Different levels of speed of

simulation Vs accuracy trade-offsimulation Vs accuracy trade-off Written in CWritten in C Easily retargettableEasily retargettable

Simplescalar: CFU issuesSimplescalar: CFU issues

More arguments than used by RISC More arguments than used by RISC instructionsinstructions• Out-of-order execution needs to take Out-of-order execution needs to take

care of the increase in dependenciescare of the increase in dependencies

New instructions in decode treeNew instructions in decode tree• Easy to add new instructions to the Easy to add new instructions to the

decode tree (machine.def)decode tree (machine.def)

Let us add a new instructionLet us add a new instruction

Achieve the operation E=A*B+C*D Achieve the operation E=A*B+C*D using one instructionusing one instruction

4 input operands and 1 output 4 input operands and 1 output operandoperand

Extension to ARM ISAExtension to ARM ISA ProvideProvide

• CompilerCompiler• AssemblerAssembler• SimulatorSimulator

Pattern for the instructionPattern for the instruction

gcc/config/arm/arm.mdgcc/config/arm/arm.md

(define_insn "*mulsi4addsi"(define_insn "*mulsi4addsi"

[(set (match_operand:SI 0 "s_register_operand" "=r")[(set (match_operand:SI 0 "s_register_operand" "=r")

(plus:SI(plus:SI

(mult:SI (match_operand:SI 2 "s_register_operand" "r")(mult:SI (match_operand:SI 2 "s_register_operand" "r")

(match_operand:SI 1 "s_register_operand" "r"))(match_operand:SI 1 "s_register_operand" "r"))

(mult:SI (match_operand:SI 4 "s_register_operand" "r")(mult:SI (match_operand:SI 4 "s_register_operand" "r")

(match_operand:SI 3 "s_register_operand" "r"))))](match_operand:SI 3 "s_register_operand" "r"))))]

""""

"ml2a%?\\t%0, %2, %1, %4, %3""ml2a%?\\t%0, %2, %1, %4, %3"

[(set_attr "type" "mult")])[(set_attr "type" "mult")])

Simplescalar changesSimplescalar changes

Instruction Decode TreeInstruction Decode Tree• Chain of decoders: Each looking at a set Chain of decoders: Each looking at a set

of bitsof bits target-arm/arm.deftarget-arm/arm.def

• New chain of decoder macros for CFU New chain of decoder macros for CFU class of instructionsclass of instructions

• Increase the number of input Increase the number of input dependencies in all the instructio dependencies in all the instructio macros from 5 to 6 (predication in ARM)macros from 5 to 6 (predication in ARM)

Simplescalar changesSimplescalar changes

sim-outorder.csim-outorder.c• Increase the number of input Increase the number of input

dependencies to be monitored in the dependencies to be monitored in the reservation unitreservation unit

• Both macros and code has to be Both macros and code has to be changedchanged

Other files need to be changed for Other files need to be changed for the same purposethe same purpose

Compile ‘test program’ and verify!Compile ‘test program’ and verify!

SummarySummary

Identify the ways to add new Identify the ways to add new instructions to Simplescalar and GCCinstructions to Simplescalar and GCC

Determine the capabilities of the Determine the capabilities of the current combiner in GCCcurrent combiner in GCC

Demonstrate the addition of a new Demonstrate the addition of a new custom instructioncustom instruction

Understand GCC to some extent!Understand GCC to some extent!


Recommended