+ All Categories

ScilabTEC 2015 - KIT

Date post: 29-Jul-2015
Category:
Upload: scilab-enterprises
View: 80 times
Download: 0 times
Share this document with a friend
Popular Tags:
32
FP7-ICT-2011-7-287733 ScilabTEC Oliver Oey [email protected] 1 FP7-ICT-2011-7-287733 ALMA Project Overview Simplifying programming for multi-cores Oliver Oey
Transcript
Page 1: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 1

FP7-ICT-2011-7-287733

ALMA Project Overview

Simplifying programming for multi-cores

Oliver Oey

Page 2: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 2

Outline

ALMA EU Project Overview Project Overview

Motivation

Results

MatrixFrontend Type inference

Loopify

Simplify

emmtrix Technologies

Summary

Page 3: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 3

ALMA Project ID Card

Three year project: 01/09/2011 – 31/01/2015

Funded by FP7: 3.2 Million Euros

Official web site: http://www.alma-project.eu/

Coordinator: Juergen Becker (KIT) and Timo Stripf (KIT)

Scientific Coordinator: Nikos Voros (TWG)

Page 4: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 4

Why do we need multi-core processors?

Until ~2005 processor performance increase driven by Clock speed

Execution optimization

Cache

Power wall

ILP wall

Led to multicore processors

Parallelism must be exposed by the programmer

(source http://www.gotw.ca/publications/concurrency-ddj.htm)

Page 5: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 5

Motivation

End user perspective Target architecture perspective

• Explore/Develop algorithms

• Use a simple, comfortable language• E.g. Matlab, Scilab, …

• Don’t want to care about • data types• parallelism

• End result• Performance• Energy efficient• Cost efficient• Fast development time

• Multi-Processor System-on-Chip

• Parallel processor cores• Explicit parallel programming• Distributed memory model, e.g. MPI

• Parallelism within the processor cores• Single Instruction Multiple Data• Very Long Instruction Word

• Native data types• E.g. 32-bit integer• Floating-point perform inefficient

Hide the complexity from the end user

Page 6: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 6

ALMA Development Flow (overview)

Optimized

application code on

multi-core platform

Embedded application design Multi-core hardware design

Translation to

Scilab &

annotations

Abstract

hardware

description

(ADL)

KIT

C-compiler

Multi-core

simulator

Parameters for algorithm

optimization

C-based code with parallel descriptions

ALMA

algorithm

parallelization

tools

Executable binary (for simulator and HW)

Recore

C-compiler

Structural hardware

description

Feedback for optimization

Page 7: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 7

Challenges for Compiling Scilab to MPSoCs

Scilab programming language Sequential, imperative language

Dynamic typing (scalars, vectors, matrices)

End users typically use floating-point data types

Pointer-free, i.e. no memory aliasing problems

Natural parallelism within vector operations

MPSoC target architectures

Exploit coarse-grain parallelism (task-level)

Distributed memory

Exploit fine-grain parallelism (instruction-level)

Page 8: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 8

ALMA Target Architectures

Xentium® processing tile Fixed-point DSP processing

10-issue VLIW processor

SIMD capability

Streaming communication services

Multicore Architectures Distributed memory

=> No shared memory required

No floating point unit => Use fixed-point arithmetic

Example Architecture: Recore X2014

Page 9: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 9

Application Test Cases - Telecommunications

Rx

1

Rx

NR

FFT

Equaliz

er

Channe

l

Estimat

or

Derand

o mizer

Deinter

leaver

Symbol

Decons

truction

- Cyclic

Prefix

Diversity

Combine

r

- Cyclic

PrefixFFT

SDU

Gener

ation

Data

SDU

s

Uplink

Frame

Decon

structio

n

MAC

-PHY

I/F

BS Rx

`

ALMA 1st

Increment

ALMA 2nd

Increment Tx 1

Tx

NT

FEC

Enco

der

Interl

eaver

Constel

.

Mappin

g

IFFT+ Cyclic

Prefix

S-T

Coding

IFFT+

Cyclic

Prefix

+ Pre

amble

Data

SDU

s

PHYMA

C

UL/DL

Frame

Mappe

r

UL/DL

Sched

uler

BS Tx

PDU

Generati

on

MAC

-PHY

I/F

Fram

e

Cons

tructi

on

Downlin

k

MAC/P

HY

Control

Symb

ol

Const

ructio

n

Rand

omiz

er

. . . . .

. . .

. . . .

. . . . . . . . .

FEC

Decode

r

Const.

Demap

IEEE 802.16e PHY Layer in NT x NR MIMO Configuration

Speedup:~2,8

Page 10: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 10

Application Test Cases – Image Processing

Scale Invariant Feature Transform (SIFT)

Speedup:~1,8

Page 11: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 11

0

20

40

60

80

Telecommunication Image processing

Wo

rkin

g d

ays

Manual

Using autom.Parallelization

Development effort

-57% -30%

Reduction of development effort by partially over 50%

Page 12: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 12

ALMA Workflow

Parallel

C Code

Development

Cycle II

Development using

Scilab

Development

Cycle I

ALMA

Parallelization

ToolsTesting

plattform

CPU

CPU CPU

CPU

Testing

PC

Multi-core

Processor

Page 13: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 13

ALMA Workflow (Details)

Parallel C

Code

Development Cycle I

Development with

Scilab

Sequential

Static

C Code

Paralleliza

tion

Matrix

Frontend

Parallelization

Development Cycle II

Page 14: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 14

Outline

ALMA EU Project Overview Project Overview

Motivation

Results

MatrixFrontend Type inference

Loopify

Simplify

emmtrix Technologies

Summary

Page 15: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 15

Matrix Frontend

Parallel C

CodeDevelopment with

Scilab

Sequential

Static

C Code

Paralleliza

tion

Matrix

Frontend

Parallelization

Scilab-to-C Compiler Parses Scilab code

Advanced type inference

High-level optimizations on Scilab code

Turns Scilab statements into loop nests

Generated C Code Optimized for parallelism extraction

Static memory allocation

Avoid pointers

Page 16: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 16

Requirements

Source language

Support Scilab input language

Support well-defined subset

Extend with annotation

for type inference

for parallelization

Annotated code should still be valid Scilab/Matlab code

Target language

Generate ANSI C89 code

Polyhedral code

Large Static Control Parts

Avoid pointers

Static code

No dynamic memory allocation

Avoid run-time decisions

Page 17: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 17

Type Inference

Calculate types for expressions and variables

“Type” = “Data Type” + “Shape”

Separated into 3 passes

1. Shape Inference

2. Data Type Inference

3. Variable Inference

Scilab

Type Inference

Loopify

Simplify

C Code Output

C Code

Page 18: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 18

Type Inference - Shape

Calculate shape of each Scilab statement

s = [1 2 3]; // s = 1x3

for f = 1:10 // f = 1x1

s = s + f // s = 1x3

end

Scilab

Type Inference

Loopify

Simplify

C Code Output

C Code

Page 19: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 19

Type Inference – Growing Arrays

Support growing arrays

a = 1;a(1,5) = 1; [1 0 0 0 1]

Maximum size must be known! What happens if matrix is indexed by variable?

a(1,b) = 1; // Maximum value of b unknown

Two solutions:

Scilab

Type Inference

Loopify

Simplify

C Code Output

C Code

a = zeros(1,5);

mfe_fixedsize(a);

a = 1;

a(1,b) = 1;

a = 1;

mfe_size(a, 1, 1:5);

a(1,b) = 1;

Page 20: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 20

Type Inference – Data Type

Scilab has data type function

double

int32, int16, int8

uint32, uint16, uint8

boolean

complex, real, imag

a = uint8([255 256]);

[255 0]

Scilab

Type Inference

Loopify

Simplify

C Code Output

C Code

Page 21: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 21

Type Inference – Data Type (2)

Problem: Data type is run-time specific

sqrt(1) => double

sqrt(-1) => complex double

sqrt(a) => ?

We cannot guarantee Scilab conform execution

Solution Generate warning

Ask end user to specify data type

real(sqrt(a)) => double

sqrt(complex(a)) => complex double

Scilab

Type Inference

Loopify

Simplify

C Code Output

C Code

Page 22: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 22

Type Inference – Variable

Shape and data type inference operate on expressions

Assign shape/data type to variables

Data type Limitation: Data type cannot change at run time

a = 1;

a = uint8(1);

Complex flag is “or” connected

a = 1;

a = %i;

complex_double_t a;

Scilab

Type Inference

Loopify

Simplify

C Code Output

C Code

Page 23: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 23

Type Inference – Variable (2)

Shape

Variable shape is maximum of all dimensions

a = zeros(1,3);

a = zeros(4,1);

double a[4,3];

Limitation: Number of dimensions cannot change

a = zeros(3,3);

a = zeros(3,3,3);

Scilab

Type Inference

Loopify

Simplify

C Code Output

C Code

Page 24: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 24

Loopify

Translates Matlab/Scilab variables into Data Dynamic size Static (maximum) size

Translates Matlab/Scilab statements into Loop nest Size calculation

Scilab C code

a = zeros(2,3); int32_t a_data[3][2] = {{0}};int32_t a_size[2];const int32_t a_ssize[2] = {2, 3};

for (v1 = 0; v1 < 3; ++v1) {for (v0 = 0; v0 < 2; ++v0) {

a_data[v1][v0] = 0;}

}a_size[0] = 2;a_size[1] = 3;

Scilab

Type Inference

Loopify

Simplify

C Code Output

C Code

Page 25: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 25

Simplify

Remove unnecessary “for loops” Remove unnecessary variable dimensions Remove size variables and statements for fixed

size variables

Scilab C code

a = 1;

(before simplify)

int32_t a_data[1][1] = {{0}};…for (v1 = 0; v1 < 1; ++v1) {

for (v0 = 0; v0 < 1; ++v0) {a_data[v1][v0] = 1;

}}

a = 1;

(after simplify)

int32_t a_data = 0;…a_data[v1][v0] = 1;

Scilab

Type Inference

Loopify

Simplify

C Code Output

C Code

Page 26: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 26

Results – Lines of Code

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

SIFT Magic IFFT Intracom

Scilab

C (After Simplify)

C (Before Simplify)

Page 27: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 27

Start-up company

Solutions for a parallel world

Will be founded from KIT with results from ALMA

www.emmtrix.com

Page 28: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 28

Interactive Parallelization

Control parallelization by high-level decisions in GUI Control, Traceability, Usability

Automatic test generation Reliability

CPU

CPU

CPUCPU

CPU

Page 29: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 29

emmtrix Workflow Integration

Parallel

C Code

Verification

Development with

Scilab

Iteration

emmtrix

Parallelization

Solution Test Platform

CPU

CPU CPU

CPU

Test PC

Multicore

Processor

Integration into Scilab workflow

Planned Xcos integration for model-based design

Page 30: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 30

Plans for emmtrix

Soon:

Release of MatrixFrontend for Scilab community

Free to use

Convert Scilab code to C code

Product launch of emmtrix Parallel Studio (not final name) at Embedded World 2016 (Feb, 2016) Integration into workflow

Support for different hardware platforms

Support for model-based design

Page 31: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 31

Summary

ALMA Toolchain

MatrixFrontend: Convert Scilab code to C

Parallelization of generated code

Speedup development for multi-core systems by 30-60%

emmtrix Technologies Distribution of ALMA results

Free Scilab to C converter: Matrix Frontend

Interactive parallelization tool

Page 32: ScilabTEC 2015 - KIT

FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 32

Thank you !


Recommended