+ All Categories
Home > Documents > Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK...

Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK...

Date post: 16-Aug-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
34
Code Generation for Embedded Heterogeneous Architectures on Android Oliver Reiche, Richard Membarth, Frank Hannig, and Jürgen Teich University of Erlangen-Nuremberg
Transcript
Page 1: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Code Generation for Embedded Heterogeneous Architectures on Android

Oliver Reiche, Richard Membarth, Frank Hannig, and Jürgen Teich

University of Erlangen-Nuremberg

Page 2: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

What do we need DSLs and code generation for?

3P: Performance, Productivity, and Portability

What’s the difference for embedded heterogeneous architectures?

Motivation

25-Mar-14 2Oliver Reiche / University of Erlangen-Nuremberg

Page 3: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

1. Programming Models

2. Code Generation

– HIPAcc Framework

– Renderscript Code Generation

– Vector Support

– HSA Memory Management

3. Results

Outline

Page 4: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Programming Models

Page 5: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Android NDK (Native Development Kit)

• no native support for GPUs

• low-level fine tuning:

– implicit and explicit vectorization(SSE/AVX/NEON)

– cache-aware programming

OpenCL (inoffical)

• support for CPUs, GPGPUs and others

• low-level fine tuning:

– explicit mapping of threads

– transparent memory hierarchy

– supports unified CPU/GPU memory

Programming Models

25-Mar-14 5Oliver Reiche / University of Erlangen-Nuremberg

Page 6: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Renderscript Compute

• code mapping to native threads

• targets CPUs and DSPs

• additionally targets GPUs(since Android 4.2)

Renderscript

25-Mar-14 6Oliver Reiche / University of Erlangen-Nuremberg

Filterscript

• stricter limitations

– relaxed precision

– no scatter writes

– pointers are illegal

• ensures wider compatibility

Page 7: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

On first sight, much similarities to OpenCL but fundamentally different . . .

Philosophy behind Renderscript

• higher level of programming

• to widen support for different architectures

• dynamic execution on heterogeneous platforms

• uncouple developer from target hardware

• at the cost of performance

low-level optimizations are barely possible!

Renderscript in Detail

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 7

Page 8: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

HIPAcc Framework

Page 9: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

HIPAcc Framework Overview

25-Mar-14 9Oliver Reiche / University of Erlangen-Nuremberg

Page 10: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

HIPAcc Example: Host Code

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 10

Page 11: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

HIPAcc Example: Kernel Code

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 16

Page 12: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Renderscript Code Generation

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 22

Page 13: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Memory Access Mapping

DSL Kernel:

Filterscript:

Renderscript Memory Access

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 24

1 2 3 4

Page 14: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Memory Access Mapping

DSL Kernel:

Renderscript:

Renderscript Memory Access

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 25

1 2 3 4

Page 15: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Memory Access Mapping

DSL Kernel:

Renderscript:

(4 Pixels per

Thread)

Renderscript Memory Access

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 26

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

Page 16: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Renderscript Iteration Space

• defined by output buffer size

• no custom launch configuration

When we need less threads, e. g., for

• processing multiple pixels per thread

• operating on a fraction of the buffer (ROI)

we need appropriate Iteration Space Mapping

Renderscript Iteration Space

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 27

Page 17: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Iteration Space Mapping (3 Approaches)

1. Temporary buffer

– additional memory

– copy overhead: widthROI x heightROI

Renderscript Iteration Space

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 29

IMG

temp

ROI

Page 18: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Iteration Space Mapping (3 Approaches)

1. Temporary buffer

– additional memory

– copy overhead: widthROI x heightROI

2. Dummy buffer

– allocation overhead for unused buffer

– not suitable for Filterscript

Renderscript Iteration Space

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 30

IMG

dummy

ROI

Page 19: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Iteration Space Mapping (3 Approaches)

1. Temporary buffer

– additional memory

– copy overhead: widthROI x heightROI

2. Dummy buffer

– allocation overhead for unused buffer

– not suitable for Filterscript

3. Add guards to the kernel

– suitable for Filterscript

– copy overhead:(widthIMG x heightIMG) – (widthROI x heightROI)

– minor execution overhead

Renderscript Iteration Space

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 31

IMG

ROI

Page 20: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Vector Support

Page 21: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Mobile GPUs: SIMD Units

vector support is crucial forperformance

Vector Support

• added vector typesTn (e. g., float4)

• added conversion functionsTn convert_Tn(…)

Vector Support

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 33

Single Core of the ARM Mali-T604

Page 22: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

HSA Memory Management

Page 23: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Support for unified CPU/GPU memory

• abstract memory from developer

• implicitly handle memory transfers

• manage map() and unmap() operations

avoid unnecessarymemory copies

HSA Memory Management

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 35

Page 24: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Results

Page 25: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Results: Productivity

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 37

Productivity

HIPAcc is

• up to 156x more compact than OpenCV

• up to 780x more compact than generated Renderscript

Lines of Code for implementing different image filters

Page 26: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Speedup GPU

Code Variants show

use of constant memory is almost negligible (≈5%) on embedded GPUs

Results: Performance

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 38

5x5 Gaussian Blur on an ARM Mali-T604

Page 27: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Results: Performance

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 39

Execution Time HSA (GPU with OpenCL)

Page 28: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Summary

Page 29: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Contributions: We showed

• what kind of optimizations are useful on eGPGPUs

• using DSLs for embedded devices is reasonable,high productivity in describing image filters

• implicit use of unified CPU/GPU memory

Summary

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 41

Page 30: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Contributions: We showed

• what kind of optimizations are useful on eGPGPUs

• using DSLs for embedded devices is reasonable,high productivity in describing image filters

• implicit use of unified CPU/GPU memory

Summary

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 42

HIPAcc Framework Features

• ROI definition

• boundary handling modes

• interpolation modes

• image pyramids

• built-in architecture model

• automatic exploration

• target-specific optimizations

HIPAcc Compiler Features

• exploit full GPU memory hierarchy

• loop unrolling

• constant propagation

• multiple pixels per thread

• forced use of textures

• vectorization (point operators)

• unified CPU/GPU memory support

Page 31: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Contributions: We showed

• what kind of optimizations are useful on eGPGPUs

• using DSLs for embedded devices is reasonable,high productivity in describing image filters

• implicit use of unified CPU/GPU memory

Summary

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 43

HIPAcc Framework Features

• ROI definition

• boundary handling modes

• interpolation modes

• image pyramids

• built-in architecture model

• automatic exploration

• target-specific optimizations

HIPAcc Compiler Features

• exploit full GPU memory hierarchy

• loop unrolling

• constant propagation

• multiple pixels per thread

• forced use of textures

• vectorization (point operators)

• unified CPU/GPU memory support

Page 32: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Contributions: We showed

• what kind of optimizations are useful on eGPGPUs

• using DSLs for embedded devices is reasonable,high productivity in describing image filters

• implicit use of unified CPU/GPU memory

Summary

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 44

HIPAcc Framework Features

• ROI definition

• boundary handling modes

• interpolation modes

• image pyramids

• built-in architecture model

• automatic exploration

• target-specific optimizations

HIPAcc Compiler Features

• exploit full GPU memory hierarchy

• loop unrolling

• constant propagation

• multiple pixels per thread

• forced use of textures

• vectorization (point operators)

• unified CPU/GPU memory support

Page 33: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Questions?

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 45

HIPAcc framework sources released under Simplified BSD License.

http://hipacc-lang.org

University Booth Demonstration: Wednesday, 12 p. m. & 4 p. m.

Page 34: Code Generation for Embedded Heterogeneous Architectures on Android · 2017. 4. 10. · Android NDK (Native Development Kit) • no native support for GPUs • low-level fine tuning:

Results: Performance

25-Mar-14 Oliver Reiche / University of Erlangen-Nuremberg 46

Speedup CPU


Recommended