Date post: | 29-Jul-2015 |
Category: |
Presentations & Public Speaking |
Upload: | scilab-enterprises |
View: | 80 times |
Download: | 0 times |
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 1
FP7-ICT-2011-7-287733
ALMA Project Overview
Simplifying programming for multi-cores
Oliver Oey
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 2
Outline
ALMA EU Project Overview Project Overview
Motivation
Results
MatrixFrontend Type inference
Loopify
Simplify
emmtrix Technologies
Summary
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 3
ALMA Project ID Card
Three year project: 01/09/2011 – 31/01/2015
Funded by FP7: 3.2 Million Euros
Official web site: http://www.alma-project.eu/
Coordinator: Juergen Becker (KIT) and Timo Stripf (KIT)
Scientific Coordinator: Nikos Voros (TWG)
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 4
Why do we need multi-core processors?
Until ~2005 processor performance increase driven by Clock speed
Execution optimization
Cache
Power wall
ILP wall
Led to multicore processors
Parallelism must be exposed by the programmer
(source http://www.gotw.ca/publications/concurrency-ddj.htm)
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 5
Motivation
End user perspective Target architecture perspective
• Explore/Develop algorithms
• Use a simple, comfortable language• E.g. Matlab, Scilab, …
• Don’t want to care about • data types• parallelism
• End result• Performance• Energy efficient• Cost efficient• Fast development time
• Multi-Processor System-on-Chip
• Parallel processor cores• Explicit parallel programming• Distributed memory model, e.g. MPI
• Parallelism within the processor cores• Single Instruction Multiple Data• Very Long Instruction Word
• Native data types• E.g. 32-bit integer• Floating-point perform inefficient
Hide the complexity from the end user
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 6
ALMA Development Flow (overview)
Optimized
application code on
multi-core platform
Embedded application design Multi-core hardware design
Translation to
Scilab &
annotations
Abstract
hardware
description
(ADL)
KIT
C-compiler
Multi-core
simulator
Parameters for algorithm
optimization
C-based code with parallel descriptions
ALMA
algorithm
parallelization
tools
Executable binary (for simulator and HW)
Recore
C-compiler
Structural hardware
description
Feedback for optimization
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 7
Challenges for Compiling Scilab to MPSoCs
Scilab programming language Sequential, imperative language
Dynamic typing (scalars, vectors, matrices)
End users typically use floating-point data types
Pointer-free, i.e. no memory aliasing problems
Natural parallelism within vector operations
MPSoC target architectures
Exploit coarse-grain parallelism (task-level)
Distributed memory
Exploit fine-grain parallelism (instruction-level)
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 8
ALMA Target Architectures
Xentium® processing tile Fixed-point DSP processing
10-issue VLIW processor
SIMD capability
Streaming communication services
Multicore Architectures Distributed memory
=> No shared memory required
No floating point unit => Use fixed-point arithmetic
Example Architecture: Recore X2014
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 9
Application Test Cases - Telecommunications
Rx
1
Rx
NR
FFT
Equaliz
er
Channe
l
Estimat
or
Derand
o mizer
Deinter
leaver
Symbol
Decons
truction
- Cyclic
Prefix
Diversity
Combine
r
- Cyclic
PrefixFFT
SDU
Gener
ation
Data
SDU
s
Uplink
Frame
Decon
structio
n
MAC
-PHY
I/F
BS Rx
`
ALMA 1st
Increment
ALMA 2nd
Increment Tx 1
Tx
NT
FEC
Enco
der
Interl
eaver
Constel
.
Mappin
g
IFFT+ Cyclic
Prefix
S-T
Coding
IFFT+
Cyclic
Prefix
+ Pre
amble
Data
SDU
s
PHYMA
C
UL/DL
Frame
Mappe
r
UL/DL
Sched
uler
BS Tx
PDU
Generati
on
MAC
-PHY
I/F
Fram
e
Cons
tructi
on
Downlin
k
MAC/P
HY
Control
Symb
ol
Const
ructio
n
Rand
omiz
er
. . . . .
. . .
. . . .
. . . . . . . . .
FEC
Decode
r
Const.
Demap
IEEE 802.16e PHY Layer in NT x NR MIMO Configuration
Speedup:~2,8
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 10
Application Test Cases – Image Processing
Scale Invariant Feature Transform (SIFT)
Speedup:~1,8
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 11
0
20
40
60
80
Telecommunication Image processing
Wo
rkin
g d
ays
Manual
Using autom.Parallelization
Development effort
-57% -30%
Reduction of development effort by partially over 50%
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 12
ALMA Workflow
Parallel
C Code
Development
Cycle II
Development using
Scilab
Development
Cycle I
ALMA
Parallelization
ToolsTesting
plattform
CPU
CPU CPU
CPU
Testing
PC
Multi-core
Processor
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 13
ALMA Workflow (Details)
Parallel C
Code
Development Cycle I
Development with
Scilab
Sequential
Static
C Code
Paralleliza
tion
Matrix
Frontend
Parallelization
Development Cycle II
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 14
Outline
ALMA EU Project Overview Project Overview
Motivation
Results
MatrixFrontend Type inference
Loopify
Simplify
emmtrix Technologies
Summary
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 15
Matrix Frontend
Parallel C
CodeDevelopment with
Scilab
Sequential
Static
C Code
Paralleliza
tion
Matrix
Frontend
Parallelization
Scilab-to-C Compiler Parses Scilab code
Advanced type inference
High-level optimizations on Scilab code
Turns Scilab statements into loop nests
Generated C Code Optimized for parallelism extraction
Static memory allocation
Avoid pointers
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 16
Requirements
Source language
Support Scilab input language
Support well-defined subset
Extend with annotation
for type inference
for parallelization
Annotated code should still be valid Scilab/Matlab code
Target language
Generate ANSI C89 code
Polyhedral code
Large Static Control Parts
Avoid pointers
Static code
No dynamic memory allocation
Avoid run-time decisions
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 17
Type Inference
Calculate types for expressions and variables
“Type” = “Data Type” + “Shape”
Separated into 3 passes
1. Shape Inference
2. Data Type Inference
3. Variable Inference
Scilab
Type Inference
Loopify
Simplify
C Code Output
C Code
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 18
Type Inference - Shape
Calculate shape of each Scilab statement
s = [1 2 3]; // s = 1x3
for f = 1:10 // f = 1x1
s = s + f // s = 1x3
end
Scilab
Type Inference
Loopify
Simplify
C Code Output
C Code
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 19
Type Inference – Growing Arrays
Support growing arrays
a = 1;a(1,5) = 1; [1 0 0 0 1]
Maximum size must be known! What happens if matrix is indexed by variable?
a(1,b) = 1; // Maximum value of b unknown
Two solutions:
Scilab
Type Inference
Loopify
Simplify
C Code Output
C Code
a = zeros(1,5);
mfe_fixedsize(a);
a = 1;
a(1,b) = 1;
a = 1;
mfe_size(a, 1, 1:5);
a(1,b) = 1;
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 20
Type Inference – Data Type
Scilab has data type function
double
int32, int16, int8
uint32, uint16, uint8
boolean
complex, real, imag
a = uint8([255 256]);
[255 0]
Scilab
Type Inference
Loopify
Simplify
C Code Output
C Code
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 21
Type Inference – Data Type (2)
Problem: Data type is run-time specific
sqrt(1) => double
sqrt(-1) => complex double
sqrt(a) => ?
We cannot guarantee Scilab conform execution
Solution Generate warning
Ask end user to specify data type
real(sqrt(a)) => double
sqrt(complex(a)) => complex double
Scilab
Type Inference
Loopify
Simplify
C Code Output
C Code
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 22
Type Inference – Variable
Shape and data type inference operate on expressions
Assign shape/data type to variables
Data type Limitation: Data type cannot change at run time
a = 1;
a = uint8(1);
Complex flag is “or” connected
a = 1;
a = %i;
complex_double_t a;
Scilab
Type Inference
Loopify
Simplify
C Code Output
C Code
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 23
Type Inference – Variable (2)
Shape
Variable shape is maximum of all dimensions
a = zeros(1,3);
a = zeros(4,1);
double a[4,3];
Limitation: Number of dimensions cannot change
a = zeros(3,3);
a = zeros(3,3,3);
Scilab
Type Inference
Loopify
Simplify
C Code Output
C Code
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 24
Loopify
Translates Matlab/Scilab variables into Data Dynamic size Static (maximum) size
Translates Matlab/Scilab statements into Loop nest Size calculation
Scilab C code
a = zeros(2,3); int32_t a_data[3][2] = {{0}};int32_t a_size[2];const int32_t a_ssize[2] = {2, 3};
for (v1 = 0; v1 < 3; ++v1) {for (v0 = 0; v0 < 2; ++v0) {
a_data[v1][v0] = 0;}
}a_size[0] = 2;a_size[1] = 3;
Scilab
Type Inference
Loopify
Simplify
C Code Output
C Code
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 25
Simplify
Remove unnecessary “for loops” Remove unnecessary variable dimensions Remove size variables and statements for fixed
size variables
Scilab C code
a = 1;
(before simplify)
int32_t a_data[1][1] = {{0}};…for (v1 = 0; v1 < 1; ++v1) {
for (v0 = 0; v0 < 1; ++v0) {a_data[v1][v0] = 1;
}}
a = 1;
(after simplify)
int32_t a_data = 0;…a_data[v1][v0] = 1;
Scilab
Type Inference
Loopify
Simplify
C Code Output
C Code
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 26
Results – Lines of Code
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
SIFT Magic IFFT Intracom
Scilab
C (After Simplify)
C (Before Simplify)
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 27
Start-up company
Solutions for a parallel world
Will be founded from KIT with results from ALMA
www.emmtrix.com
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 28
Interactive Parallelization
Control parallelization by high-level decisions in GUI Control, Traceability, Usability
Automatic test generation Reliability
CPU
CPU
CPUCPU
CPU
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 29
emmtrix Workflow Integration
Parallel
C Code
Verification
Development with
Scilab
Iteration
emmtrix
Parallelization
Solution Test Platform
CPU
CPU CPU
CPU
Test PC
Multicore
Processor
Integration into Scilab workflow
Planned Xcos integration for model-based design
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 30
Plans for emmtrix
Soon:
Release of MatrixFrontend for Scilab community
Free to use
Convert Scilab code to C code
Product launch of emmtrix Parallel Studio (not final name) at Embedded World 2016 (Feb, 2016) Integration into workflow
Support for different hardware platforms
Support for model-based design
FP7-ICT-2011-7-287733 – ScilabTEC – Oliver Oey – [email protected] 31
Summary
ALMA Toolchain
MatrixFrontend: Convert Scilab code to C
Parallelization of generated code
Speedup development for multi-core systems by 30-60%
emmtrix Technologies Distribution of ALMA results
Free Scilab to C converter: Matrix Frontend
Interactive parallelization tool