+ All Categories
Home > Documents > GCC Autovectorization - A journey through compiler options...

GCC Autovectorization - A journey through compiler options...

Date post: 24-May-2020
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
39
GCC Autovectorization A journey through compiler options, SIMD extensions and C standards Andreas Schmitz Seminar: Automation, Compilers, and Code-Generation 06.07.2016
Transcript
Page 1: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Autovectorization

A journey through compiler options, SIMD extensions and C standards

Andreas Schmitz

Seminar: Automation, Compilers, and Code-Generation06.07.2016

Page 2: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Motivation

What is vectorization?Perform one operation on multiple elements of a vectorChunk-wise processing instead of element wiseCan improve computing time

MotivationUtilize the CPU’s vectorization featuresProduce fast and small binaries

2 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 3: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Disclaimer

DisclaimerThe following only concentrates on C11 and GCC 5.3Some of the shown code snippets / directives may also apply toC++, older C standards or other compilers

3 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 4: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Agenda

BasicsMemory AlignmentPointer Aliasing(Intel) SIMD Extensions

Empiric Analysis of GCC’s autovectorizationGCC Compiler & Compiler FlagsAutovectorization Examples

Autovectorization Requirements and Limitations

Conclusion

References

4 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 5: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Basics

5 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 6: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Memory Alignment I

OverviewData is stored in memory aligned or unaligned: Aligned: Address is a multiple of the alignment

Some architectures need data to be alignedIntel: unaligned data access possible. But: Computation Overhead: Multiple reads necessary: Additional code to extract the data

Data(-structures) can be aligned by adding padding

6 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 7: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Memory Alignment II

Dealing with Alignment

Directives to control the alignment behaviorGCC specific [FSF15, 6.38]: __attribute__ ((aligned (ALIGN))): __attribute__ ((packed)): Used with: struct and union or simply arrays

C11 Standard [ISO11, 6.2.8,7.22.3]: aligned_alloc(size_t alignment, size_t size);: _Alignas(expression) and _Alignas(type)

7 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 8: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Memory Alignment III

Examples

struct V{short s[3];} __attribute__ ((aligned(8));

char c[2] __attribute__((aligned(8)));

struct A{char a; int b;} __attribute__((packed));

8 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 9: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Pointer Aliasing I

OverviewRefers to memory addressed by different namesExample: char b; char *a = &b;

Needs to be considered by the compilerCan result in code overhead (next slide)

9 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 10: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Pointer Aliasing II

1 void foo(int *a, int *b, int* c) {2 *a = 42;3 *b = 23;4 *c = *a;5 }

Figure: Pointer Aliasing, C Code

1 mov DWORD PTR [rdi], 422 mov DWORD PTR [rsi], 233 mov eax , DWORD PTR [rdi]4 mov DWORD PTR [rdx], eax

Figure: Pointer Aliasing, Resulting Assembly Code

10 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 11: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Pointer Aliasing III

restrict Keyword [ISO07, §6.7.3.1]

C99 keyword to mark pointers as not being aliases

11 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 12: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Pointer Aliasing IV

1 void foo(int * restrict a, int *restrict b, int* c) {

2 *a = 42;3 *b = 23;4 *c = *a;5 }

Figure: Resolving Pointer Aliasing, C Code

1 mov DWORD PTR [rdi], 422 mov DWORD PTR [rsi], 233 mov DWORD PTR [rdx], 42

Figure: Resolving Pointer Aliasing, Resulting Assembly

12 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 13: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Pointer Aliasing V

Remarksrestrict needs to be used carefullyProgrammer is responsible for proper usageMishandling can lead to wrong programs

13 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 14: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

(Intel) SIMD Extensions I

SIMD Extension OverviewIntel: MMX, SSE, SSE2, ... ,AVX, AVX2, AVX-512ARM: NEONHave “Bookkeeping” and Initialization overheadSIMD Extensions usually differ in:: size/number of the registers: operations: data types: ...

→ Typically require: aligned data, no pointer aliasing

14 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 15: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

(Intel) SIMD Extensions II

512 bits256 bits

128 bitsZMM0 YMM0 XMM0

0512

ZMM31 YMM31 XMM31

Figure: x86-64 Vector Registers

AVX-512 (ZMM0-ZMM31)AVX (YMM0-YMM15)SSE (XMM0-XMM15)

15 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 16: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

(Intel) SIMD Extensions III

x86-64 Vector Operations - Overview [Lom11]Example Instructions: Move: (V)MOV[A/U]P[D/S]: Comparing: (V)CMP[P/S][D/S]: Arithmetic Operations: (V)[ADD/SUB/MUL/DIV][P/S][D/S]

Instruction Decoding: V - AVX: P,S - packed, scalar: A,U - aligned, unaligned: D,S - double, single: B, W, D, Q - byte, word, doubleword, quadword integers: [] - required, () - optional

Example: vmovapd ymm0, YMMWORD PTR [rdi+rax]

16 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 17: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Empiric Analysis of GCC’s autovectorization

17 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 18: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Compiler FlagsGCC Autovectorization Compiler Flags [FSF15]

-O -ftree-vectorize: Activate autovectorization

-O3: Optimizations including autovectorization,

-fopt-info-vec,-fopt-invo-vec-missed: List (not) vectorized loops + additional information

-march=native: Use instructions supported by the local CPU

-falign-functions=32,-falign-loops=32: Aligns the address of functions / loops to be a multiple of 32 bytes

18 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 19: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Directives

GCC Vectorization pragmas [FSF15, 6.60.14]

#pragma GCC ivdep: programmer asserts no loop-carried dependencies

19 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 20: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Autovectorization I

GCC Autovectorization Examples

1. Simple Loop2. Improved Loop3. Optimized Loop4. C11 compatible solution5. Non profitable loop

→ Compiled with the previously shown compiler flags

20 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 21: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Autovectorization II

Version 1: Simple Loop1 # define SIZE (1L << 16)2 void simpleLoop ( double * a, double * b)3 {4 for (int i = 0; i < SIZE; i++)5 {6 a[i] += b[i];7 }8 }

21 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 22: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Autovectorization III

GCC output: Version 1

s imp leLoop . c : 4 : 5 : note : l oop v e c t o r i z e ds imp leLoop . c : 4 : 5 : note : l oop v e r s i o n e d f o r

v e c t o r i z a t i o n because o f p o s s i b l e a l i a s i n gs imp leLoop . c : 4 : 5 : note : l oop p e e l e d f o r

v e c t o r i z a t i o n to enhance a l i gnment

DEMO: Version 1Resulting assembly code

22 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 23: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Autovectorization IV

Version 2: Improved Loop1 # define SIZE (1L << 16)2 void improvedLoop ( double * restrict a, double *

restrict b)3 {4 for (int i = 0; i < SIZE; i++)5 {6 a[i] += b[i];7 }8 }

23 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 24: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Autovectorization V

GCC output: Version 2

improvedLoop . c : 4 : 5 : note : l oop v e c t o r i z e dimprovedLoop . c : 4 : 5 : note : l oop p e e l e d f o r

v e c t o r i z a t i o n to enhance a l i gnment

DEMO: Version 2Resulting assembly code

24 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 25: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Autovectorization VI

Version 3: Optimized Loop1 # define SIZE (1L << 16)2 # define GCC_ALN (var , alignment )

__builtin_assume_aligned (var , alignment )3 void optimizedLoop ( double * restrict a, double *

restrict b)4 {5 a = ( double *) GCC_ALN (a, 32);6 b = ( double *) GCC_ALN (b, 32);7 for (int i = 0; i < SIZE; i++)8 {9 a[i] += b[i];

10 }11 }

25 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 26: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Autovectorization VIIRemark

__builtin_assume_aligned: Caller has to assure the memory isaligned → segfault otherwise

GCC output: Version 3

opt im izedLoop . c : 7 : 5 : note : l oop v e c t o r i z e d

.L2:vmovapd ymm0 , YMMWORD PTR [rdi+rax]vaddpd ymm0 , ymm0 , YMMWORD PTR [rsi+rax]vmovapd YMMWORD PTR [rdi+rax], ymm0add rax , 32cmp rax , 524288jne .L2

26 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 27: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Autovectorization VIIIC11 compatible solution1 struct data{2 alignas (32) double vec[SIZE ];3 };4 void optimizedLoop ( struct data * restrict a,

struct data * restrict b)5 {6 for (int i = 0; i < SIZE; i++)7 a->vec[i] += b->vec[i];8 }

GCC creates exactly the same outputAdvantage: Can be compiled with other compilersBut: Other compilers may need additiona directives/keywords

27 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 28: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

GCC Autovectorization IX

Empiric Runtime Analysis

Loop Number of cycles (in ∅) 1

Simple Loop 106.442Improved Loop 105.883Optimized Loop 99.719Optimized Loop C11 99.540Non-vectorized Loop 444.142

Table: Average runtime of the example loops

1TSC using rdtscp instruction

28 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 29: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Autovectorization - Not profitable loops

Non profitable loop1 void nonProfitableLoop ( double * a, double * b)2 {3 for (int i = 0; i < 8; i++)4 {5 a[i] += b[i];6 }7 }

GCC output with -fopt-info-vec-missed

nonP ro f i t a b l e Loop . c : 3 : 5 : note : not v e c t o r i z e d :v e c t o r i z a t i o n not p r o f i t a b l e .

29 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 30: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Autovectorization Requirements andLimitations

30 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 31: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Autovectorizazion Requirements andLimitations

Requirements and Limitations [Cor12]

1. Countable loops2. No backward loop-carried dependencies3. No function calls

: Except vectorizable math functions e.g. sin, sqrt,...4. Straight-line code (only one control flow: no switch)5. Loop to be vectorized must be innermost loop if nested

→ Intel Vectorization Guidelines [Sab12]

31 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 32: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Conclusion

32 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 33: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Conclusion I

Vector-aware coding

Follow the Vectorization GuidelinesEvaluate compiler reports/outputCheck the resulting assembly codeEvaluate the performance / binary size

33 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 34: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

Conclusion II

What we haven’t talked aboutPipeliningCache Utilization

34 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 35: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

References

35 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 36: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

References I[Cor12] Corden, Martyn:

Requirements for Vectorizable Loops.(2012).https://software.intel.com/en-us/articles/requirements-for-vectorizable-loops/

[Eva06] Evans, David:x86 Assembly Guide.(2006).http://www.cs.virginia.edu/~evans/cs216/guides/x86.html

[FSF15] Free Software Foundation, Inc.:Using the GNU Compiler Collection (GCC).(2015).https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/

36 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 37: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

References II[ISO07] International Organization for Standardization:

Programming Languages - C99.Version: 2007.http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf.Geneva, CH, 2007. –Standard

[ISO11] International Organization for Standardization:Programming Languages - C - Committee Draft.Version: 2011.http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf.Geneva, CH, 2011. –Standard

37 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 38: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

References III

[Lom11] Lomont, Chris:Introduction to Intel R© Advanced Vector Extensions.(2011).https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions

[Lom12] Lomont, Chris:Introduction to x64 Assembly.(2012).https://software.intel.com/en-us/articles/introduction-to-x64-assembly

38 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016

Page 39: GCC Autovectorization - A journey through compiler options ...hpac.rwth-aachen.de/teaching/sem-accg-16/slides/08... · GCC Autovectorization - A journey through compiler options,

References IV

[Pip12] Piper, Chuck:An Introduction to Vectorization with the Intel R© C++Compiler.(2012).http://d3f8ykwhia686p.cloudfront.net/1live/intel/An_Introduction_to_Vectorization_with_Intel_Compiler_021712.pdf

[Sab12] Sabahi, Mark:A Guide to Auto-vectorization with Intel R© C++ Compilers.(2012).https://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-with-intel-c-compilers

39 GCC AutovectorizationAndreas Schmitz | Seminar: Automation, Compilers, and Code-Generation | 06.07.2016


Recommended