+ All Categories
Home > Documents > Generating a software loop with memory accesses

Generating a software loop with memory accesses

Date post: 14-Jan-2016
Category:
Upload: dirk
View: 26 times
Download: 0 times
Share this document with a friend
Description:
Generating a software loop with memory accesses. TigerSHARC assembly syntax. Concepts. Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code - PowerPoint PPT Presentation
Popular Tags:
38
Generating a software loop with memory accesses TigerSHARC assembly syntax
Transcript
Page 1: Generating a software loop with memory accesses

Generating a software loop with memory accesses

TigerSHARC assembly syntax

Page 2: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

2 / 38

Concepts

Learning just enough TigerSHARC assembly code to make a software loop “work”

Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code Our FIRST_ASM code

Looking in “MIXED mode” at the code generated by the compiler

Page 3: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

3 / 38

Test Driven Development

DescribeRequirements

Design Solution

Build Solution Test Solution

WriteAcceptance Tests

WriteUnit Tests

CUSTOMER

DEVELOPER

Work with customer to check that the tests properly express what the customer wants done. Iterative process with customer “heavily involved” – “Agile” methodology.

Page 4: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

4 / 38

Note

Special marker

Compiler optimization

FLOATS 927 304 -- THREE FOLD

INTS 960 150 – SIX FOLD

Why the difference, and can we do better, and do we want to?

Note the failures – what are they

Page 5: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

5 / 38

Write tests about passing values back from an assembly code routine

Page 6: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

6 / 38

More detailed look at the code

Single semi-colonsDouble semi-colons

Start function labelEnd function label

Used for“profiling code”

Label format similar to 68KNeeds leading underscore and final colon

As with 68K and Blackfin needs a .sectionBut name and format different

As with 68K need .align statementIs the “4” in bytes (8 bits)

or words (32 bits)

As with 68K need .globalto tell other code that this function

exists

Page 7: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

7 / 38

Return registers There are many, depending on what you need to return Here we need to use J8 as the return register to pass back “integer” pointer

Many registers available – need ability to control usage J0 to J31 – registers (integers and pointers) (SISD mode) XR0 to XR31 – registers (integers) (SISD mode) XFR0 to XFR31 – registers (floats) (SISD mode)

Did I also mention I0 to I31 – registers (integers and pointers) (SISD mode) YR0 to YR31 , YFR0 to YFR31 (SIMD mode) XYR, YXR and R registers (SIMD mode) And also the MIMD modes And the double registers and the quad registers …….

#define return_pt_J8 J8 // J8 is a VOLATILE, NON-PRESERVED register

Page 8: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

8 / 38

Parameter passing

SPACES for first four parameters ARE ALWAYS present on the stack (as with 68K)

But the first four parameters are passed in registers (J4, J5, J6 and J7 most of the time) (as with MIPS and Blackfin)

The parameters passed in registers are often stored into the spaces on the stack (like the MIPS) as the first step when assembly code functions call assembly code functions

J4, J5, J6 and J7 are volatile, non-preserved registers

Page 9: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

9 / 38

Can we pass back the start of the final array

Still passing tests byaccident and this needs to be conditional returnvalue

Page 10: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

10 / 38

What we need to know based on experiences from other processors Can we return from an assembly language routine

without crashing the processor? Return a parameter from assembly language routine

(Is it same for ints and floats?) Pass parameters into assembly language

(Is it same for ints and floats?) Do IF THEN ELSE statements Read and write values to memory Read and write values in a loop Do some mathematics on the values fetched from

memoryAll this stuff is demonstrated by coding

HalfWaveRectifyASM( )

Page 11: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

11 / 38

Why is ELSE a keyword

FOUR PART ELSE INSTRUCTION IS LEGAL

IF JLT; ELSE, J1 = J2 + J3; // Conditional execution – if true ELSE, XR1 = XR2 + XR3; // Conditional – if true YFR1 = YFR2 + YFR3;; // Unconditional -- always

IF JLT; DO, J1 = J2 + J3; // Conditional execution -- if true DO, XR1 = XR2 + XR3; // Conditional -- if true YFR1 = YFR2 + YFR3;; // Unconditional -- always

Having this sort of format means that the instruction pipeline is not disrupted when we do IF statements

Page 12: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

12 / 38

Label name is not the problem

NOTE:This is “C-like” syntax,But it is not “C”

Statement must end in ;;Not ;

ONE semicolon = end of instructionTWO semicolons = end of parallel instruction line

Page 13: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

13 / 38

Add dual-semicolons everywhereWorry about “multiple issues” later

This dual semi-colonIs so important that youMUST code review for it allthe time or else you wasteso much time in theLab. Key in exams / quizzes

At last an error I know how to fix

Page 14: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

14 / 38

Well I thought I understood it !!!

Speed issue – JUMP instructions can’t be too close together when stored in memory

Not normally a problem when “if” code is larger

Page 15: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

15 / 38

Add a single instruction of 4 NOPsnop; nop; nop; nop;; TEMPORARY Fix the last error as part of Assignment 1Fix the remaining error

In handling the IF THEN ELSEas part of assignment 1

Worry about code efficiency later(refactor) when all code working

Page 16: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

16 / 38

What we need to know based on experiences from other processors Can we return from an assembly language routine

without crashing the processor? Return a parameter from assembly language routine

(Is it same for ints and floats?) Pass parameters into assembly language

(Is it same for ints and floats?) Do IF THEN ELSE statements Read and write values to memory Read and write values in a loop Do some mathematics on the values fetched from

memoryAll this stuff is demonstrated by coding

HalfWaveRectifyASM( )

Page 17: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

17 / 38

Target. Changing this C++ code into assembly (to get “more” speed)

Code we generated yesterday was similar to parts of this, but not equivalent.

Re-factor the code to make the assembly code and C++ functionality equivalent

Page 18: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

18 / 38

The code was not exactly what we designed (C++ equivalent) – re-factor and retest after the re-factoring

NEXT STEP

Page 19: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

19 / 38

Refactored C++ code I THINK I UNDERSTANDENOUGH TO CHANGE THEFORMAT OF THE

IF-THEN-ELSE

TO OPTIMIZE THIS PARTICULAR CODE BIT

USE : IF TRUE EXECUTE THIS STATEMENT – SINGLE LINE

Avoiding JUMPS in the mainflow of the code will speedthe flow of the code

Almost right. SYNTAX ERROR

Look in the manual to findthe correct syntax

IF NJLE; DO, J8 = 0

Page 20: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

20 / 38

No syntax errors (No CODE ERRORS).

Code does not work (CODE DEFECTS)

We don’t haveenough code to

pass all the testsbut we are failingtests we did not

expect to fail

Page 21: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

21 / 38

Run “forensic tests” to find out where DEFECT is being introduced

Identify mistake byremoving “code

sections”

Without the IF

Page 22: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

22 / 38

Add another line to the codeCan now spot the error

New format of

IF-THEN-ELSE

Is doing exactly the opposite of what we want

IF NOT TRUE return NULL (0)

Need JLE not NJLE

Page 23: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

23 / 38

Assignment 1 – code the following as a software loop – follow MIPS / Blackfin approach

DONE DURING TUTOTIAL

int CalculateSum(void) {

int sum = 0;

for (int count = 0; count < 6; count++) {

sum = sum + count;

}

return sum;

}

Page 24: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

24 / 38

Reminder – software for-loopbecomes “while loop” with initial test

int CalculateSum(void) { int sum = 0; int count = 0; while (count < 6) { sum = sum + count; count++; } return sum;} Do line by line translation into

assembly code

Page 25: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

25 / 38

USE SOFTWARE LOOP HEREDo loop control first Have some jumps too close together

NOTEJGE is ILLEGAL

USE NJLT

Customize?#define JGE NJLT

Page 26: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

26 / 38

Run the tests with 4 nop padding to check that get out of loop as expected

Adding 4 nops-- lose 1 cyclegain an hour not trying to

solve the problem

If need the 1 cyclerefactor the code later

Page 27: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

27 / 38

Accessing memory Basic mode

Special register J31 – acts as zero when used in additions

Pt_J5 is a pointer register into an array Value_J1 is being used as a data register J registers like MIPS registers (used as pointer and data).

NOT like 68K or Blackfin registers – those can be used as either data or address registers but not both

NOTE: Later we will find that using TigerSHARC registers for data operations is a BAD idea

1. Value_J1 = [Pt_J5];; read value from memory location pointed to by J5 -- Compare to Blackfin Value_R0 = [Pt_P0];;

2. Value_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Value_J1 = [Pt_J5];; -- NEED TO CONFIRM

Page 28: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

28 / 38

Accessing memory – step 2 Basic mode

Pt_J5 is a pointer register into an array Offset_J4 is used as an offset Value_J1 is being used as a data register to receive

the memory value – load / store architecture

1. Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4)

PRE-MODIFY – address used J5 + J4, no change in J5

2. Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add operation on the J5 register (points to NEXT location)

POST-MODIFY – address used J5, then perform J5 = J5 + J4

Page 29: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

29 / 38

Add in the memory accessesFORGET TigerSHARC = RISC PROCESSOR

LOAD/STORE ONLYLike MIPS and Blackfin

Must place value intoregister, and then copyregister to memory

NO [J5 +J0] = 0;

NO J3 = 0;[J5 + J0] = J3; Uses wrong J3 – Remember TigerSHARCcan handle parallel instructions

YES

J3 = 0;;[J5 + J0] = J3;

Page 30: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

30 / 38

Understand the error messageToo many J resource usage = missing ;;

Unintentionally doing theparallel instruction line

[J5 + J0] = J2; J0 = J0 + 1;;

Page 31: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

31 / 38

Note: Missing label is not an assembler error, it’s a linker error

Fix warningsDEFECT

may be days before try to linkthen hard to find

Page 32: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

32 / 38

NOW the assembler know where “CONTINUE” is, then it can tell you that you have two JUMP instructions too close together Fix with magic 4 nops; and lose one cycle / loop

Page 33: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

33 / 38

Not getting expected Test resultsSomething is logically wrong (DEFECT)

Page 34: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

34 / 38

Obvious question – are we even getting into the loop. Add BREAKPOINT to TEST code flow.(We don’t add BREAKPOINTS to code follow in detail)

CODE NEVER GOT TOBREAKPOINT meanscode never entered loop

Forgot to do count = 0

So not even getting into loop as there isa garbage value already inCount_J0 fromcode we executedearlier -- DEFECT

Page 35: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

35 / 38

Not bad for a first effortFaster than compiler in debug mode

Page 36: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

36 / 38

Where did the float ASM code suddenly appear from? Integer 0 has bit pattern 0x0000 0000 Float 0.0 has bit pattern 0x0000 0000

Integer +6 has format b 0??? ???? ???? ???? ???? ???? ???? ????

Float +6.0 has format b 0??? ???? ???? ???? ???? ???? ???? ????

Integer -6 has format b 1??? ???? ???? ???? ???? ???? ???? ????

Float -6.0 has format b 1??? ???? ???? ???? ???? ???? ???? ????

Format’s are very different, but the sign bit is in the same place

Float algorithm - if S == 1 (negative) set to zeroOtherwise leave unchanged – same as integer algorithm Just re-use integer algorithm with a change of name

EXPONENT

Page 37: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

37 / 38

Final code – Float rectify code just has a different name

Page 38: Generating a software loop with memory accesses

04/21/23 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

38 / 38

What we NOW KNOW

Can we return from an assembly language routine without crashing the processor?

Return a parameter from assembly language routine (Is it same for ints and floats?)

Pass parameters into assembly language (Is it same for ints and floats?)

Do IF THEN ELSE statements Read and write values to memory Read and write values in a loop Do some mathematics on the values fetched from

memoryAll this stuff is demonstrated by coding

HalfWaveRectifyASM( )


Recommended