Generating a software loop with memory accesses
TigerSHARC assembly syntax
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
2 / 38
Concepts
Learning just enough TigerSHARC assembly code to make a software loop “work”
Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code Our FIRST_ASM code
Looking in “MIXED mode” at the code generated by the compiler
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
3 / 38
Test Driven Development
DescribeRequirements
Design Solution
Build Solution Test Solution
WriteAcceptance Tests
WriteUnit Tests
CUSTOMER
DEVELOPER
Work with customer to check that the tests properly express what the customer wants done. Iterative process with customer “heavily involved” – “Agile” methodology.
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
4 / 38
Note
Special marker
Compiler optimization
FLOATS 927 304 -- THREE FOLD
INTS 960 150 – SIX FOLD
Why the difference, and can we do better, and do we want to?
Note the failures – what are they
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
5 / 38
Write tests about passing values back from an assembly code routine
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
6 / 38
More detailed look at the code
Single semi-colonsDouble semi-colons
Start function labelEnd function label
Used for“profiling code”
Label format similar to 68KNeeds leading underscore and final colon
As with 68K and Blackfin needs a .sectionBut name and format different
As with 68K need .align statementIs the “4” in bytes (8 bits)
or words (32 bits)
As with 68K need .globalto tell other code that this function
exists
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
7 / 38
Return registers There are many, depending on what you need to return Here we need to use J8 as the return register to pass back “integer” pointer
Many registers available – need ability to control usage J0 to J31 – registers (integers and pointers) (SISD mode) XR0 to XR31 – registers (integers) (SISD mode) XFR0 to XFR31 – registers (floats) (SISD mode)
Did I also mention I0 to I31 – registers (integers and pointers) (SISD mode) YR0 to YR31 , YFR0 to YFR31 (SIMD mode) XYR, YXR and R registers (SIMD mode) And also the MIMD modes And the double registers and the quad registers …….
#define return_pt_J8 J8 // J8 is a VOLATILE, NON-PRESERVED register
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
8 / 38
Parameter passing
SPACES for first four parameters ARE ALWAYS present on the stack (as with 68K)
But the first four parameters are passed in registers (J4, J5, J6 and J7 most of the time) (as with MIPS and Blackfin)
The parameters passed in registers are often stored into the spaces on the stack (like the MIPS) as the first step when assembly code functions call assembly code functions
J4, J5, J6 and J7 are volatile, non-preserved registers
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
9 / 38
Can we pass back the start of the final array
Still passing tests byaccident and this needs to be conditional returnvalue
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
10 / 38
What we need to know based on experiences from other processors Can we return from an assembly language routine
without crashing the processor? Return a parameter from assembly language routine
(Is it same for ints and floats?) Pass parameters into assembly language
(Is it same for ints and floats?) Do IF THEN ELSE statements Read and write values to memory Read and write values in a loop Do some mathematics on the values fetched from
memoryAll this stuff is demonstrated by coding
HalfWaveRectifyASM( )
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
11 / 38
Why is ELSE a keyword
FOUR PART ELSE INSTRUCTION IS LEGAL
IF JLT; ELSE, J1 = J2 + J3; // Conditional execution – if true ELSE, XR1 = XR2 + XR3; // Conditional – if true YFR1 = YFR2 + YFR3;; // Unconditional -- always
IF JLT; DO, J1 = J2 + J3; // Conditional execution -- if true DO, XR1 = XR2 + XR3; // Conditional -- if true YFR1 = YFR2 + YFR3;; // Unconditional -- always
Having this sort of format means that the instruction pipeline is not disrupted when we do IF statements
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
12 / 38
Label name is not the problem
NOTE:This is “C-like” syntax,But it is not “C”
Statement must end in ;;Not ;
ONE semicolon = end of instructionTWO semicolons = end of parallel instruction line
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
13 / 38
Add dual-semicolons everywhereWorry about “multiple issues” later
This dual semi-colonIs so important that youMUST code review for it allthe time or else you wasteso much time in theLab. Key in exams / quizzes
At last an error I know how to fix
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
14 / 38
Well I thought I understood it !!!
Speed issue – JUMP instructions can’t be too close together when stored in memory
Not normally a problem when “if” code is larger
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
15 / 38
Add a single instruction of 4 NOPsnop; nop; nop; nop;; TEMPORARY Fix the last error as part of Assignment 1Fix the remaining error
In handling the IF THEN ELSEas part of assignment 1
Worry about code efficiency later(refactor) when all code working
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
16 / 38
What we need to know based on experiences from other processors Can we return from an assembly language routine
without crashing the processor? Return a parameter from assembly language routine
(Is it same for ints and floats?) Pass parameters into assembly language
(Is it same for ints and floats?) Do IF THEN ELSE statements Read and write values to memory Read and write values in a loop Do some mathematics on the values fetched from
memoryAll this stuff is demonstrated by coding
HalfWaveRectifyASM( )
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
17 / 38
Target. Changing this C++ code into assembly (to get “more” speed)
Code we generated yesterday was similar to parts of this, but not equivalent.
Re-factor the code to make the assembly code and C++ functionality equivalent
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
18 / 38
The code was not exactly what we designed (C++ equivalent) – re-factor and retest after the re-factoring
NEXT STEP
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
19 / 38
Refactored C++ code I THINK I UNDERSTANDENOUGH TO CHANGE THEFORMAT OF THE
IF-THEN-ELSE
TO OPTIMIZE THIS PARTICULAR CODE BIT
USE : IF TRUE EXECUTE THIS STATEMENT – SINGLE LINE
Avoiding JUMPS in the mainflow of the code will speedthe flow of the code
Almost right. SYNTAX ERROR
Look in the manual to findthe correct syntax
IF NJLE; DO, J8 = 0
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
20 / 38
No syntax errors (No CODE ERRORS).
Code does not work (CODE DEFECTS)
We don’t haveenough code to
pass all the testsbut we are failingtests we did not
expect to fail
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
21 / 38
Run “forensic tests” to find out where DEFECT is being introduced
Identify mistake byremoving “code
sections”
Without the IF
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
22 / 38
Add another line to the codeCan now spot the error
New format of
IF-THEN-ELSE
Is doing exactly the opposite of what we want
IF NOT TRUE return NULL (0)
Need JLE not NJLE
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
23 / 38
Assignment 1 – code the following as a software loop – follow MIPS / Blackfin approach
DONE DURING TUTOTIAL
int CalculateSum(void) {
int sum = 0;
for (int count = 0; count < 6; count++) {
sum = sum + count;
}
return sum;
}
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
24 / 38
Reminder – software for-loopbecomes “while loop” with initial test
int CalculateSum(void) { int sum = 0; int count = 0; while (count < 6) { sum = sum + count; count++; } return sum;} Do line by line translation into
assembly code
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
25 / 38
USE SOFTWARE LOOP HEREDo loop control first Have some jumps too close together
NOTEJGE is ILLEGAL
USE NJLT
Customize?#define JGE NJLT
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
26 / 38
Run the tests with 4 nop padding to check that get out of loop as expected
Adding 4 nops-- lose 1 cyclegain an hour not trying to
solve the problem
If need the 1 cyclerefactor the code later
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
27 / 38
Accessing memory Basic mode
Special register J31 – acts as zero when used in additions
Pt_J5 is a pointer register into an array Value_J1 is being used as a data register J registers like MIPS registers (used as pointer and data).
NOT like 68K or Blackfin registers – those can be used as either data or address registers but not both
NOTE: Later we will find that using TigerSHARC registers for data operations is a BAD idea
1. Value_J1 = [Pt_J5];; read value from memory location pointed to by J5 -- Compare to Blackfin Value_R0 = [Pt_P0];;
2. Value_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Value_J1 = [Pt_J5];; -- NEED TO CONFIRM
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
28 / 38
Accessing memory – step 2 Basic mode
Pt_J5 is a pointer register into an array Offset_J4 is used as an offset Value_J1 is being used as a data register to receive
the memory value – load / store architecture
1. Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4)
PRE-MODIFY – address used J5 + J4, no change in J5
2. Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add operation on the J5 register (points to NEXT location)
POST-MODIFY – address used J5, then perform J5 = J5 + J4
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
29 / 38
Add in the memory accessesFORGET TigerSHARC = RISC PROCESSOR
LOAD/STORE ONLYLike MIPS and Blackfin
Must place value intoregister, and then copyregister to memory
NO [J5 +J0] = 0;
NO J3 = 0;[J5 + J0] = J3; Uses wrong J3 – Remember TigerSHARCcan handle parallel instructions
YES
J3 = 0;;[J5 + J0] = J3;
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
30 / 38
Understand the error messageToo many J resource usage = missing ;;
Unintentionally doing theparallel instruction line
[J5 + J0] = J2; J0 = J0 + 1;;
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
31 / 38
Note: Missing label is not an assembler error, it’s a linker error
Fix warningsDEFECT
may be days before try to linkthen hard to find
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
32 / 38
NOW the assembler know where “CONTINUE” is, then it can tell you that you have two JUMP instructions too close together Fix with magic 4 nops; and lose one cycle / loop
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
33 / 38
Not getting expected Test resultsSomething is logically wrong (DEFECT)
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
34 / 38
Obvious question – are we even getting into the loop. Add BREAKPOINT to TEST code flow.(We don’t add BREAKPOINTS to code follow in detail)
CODE NEVER GOT TOBREAKPOINT meanscode never entered loop
Forgot to do count = 0
So not even getting into loop as there isa garbage value already inCount_J0 fromcode we executedearlier -- DEFECT
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
35 / 38
Not bad for a first effortFaster than compiler in debug mode
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
36 / 38
Where did the float ASM code suddenly appear from? Integer 0 has bit pattern 0x0000 0000 Float 0.0 has bit pattern 0x0000 0000
Integer +6 has format b 0??? ???? ???? ???? ???? ???? ???? ????
Float +6.0 has format b 0??? ???? ???? ???? ???? ???? ???? ????
Integer -6 has format b 1??? ???? ???? ???? ???? ???? ???? ????
Float -6.0 has format b 1??? ???? ???? ???? ???? ???? ???? ????
Format’s are very different, but the sign bit is in the same place
Float algorithm - if S == 1 (negative) set to zeroOtherwise leave unchanged – same as integer algorithm Just re-use integer algorithm with a change of name
EXPONENT
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
37 / 38
Final code – Float rectify code just has a different name
04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
38 / 38
What we NOW KNOW
Can we return from an assembly language routine without crashing the processor?
Return a parameter from assembly language routine (Is it same for ints and floats?)
Pass parameters into assembly language (Is it same for ints and floats?)
Do IF THEN ELSE statements Read and write values to memory Read and write values in a loop Do some mathematics on the values fetched from
memoryAll this stuff is demonstrated by coding
HalfWaveRectifyASM( )