+ All Categories
Home > Documents > Course admin stuff - Circuits and...

Course admin stuff - Circuits and...

Date post: 28-Dec-2019
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
25
Course admin stuff First coursework released this evening (Why this evening? Maximal lectures before submission) Due Jan 27 th 23:59, two weeks time Where to find it Spec is on github: https://github.com/HPCE/hpce-2014-cw1 Can just google “m8pple” Submission for this coursework is via blackboard HPCE / dt10/ 2015 / 1.1
Transcript

Course admin stuff

• First coursework released this evening

– (Why this evening? Maximal lectures before submission)

– Due Jan 27th 23:59, two weeks time

• Where to find it

– Spec is on github: https://github.com/HPCE/hpce-2014-cw1

• Can just google “m8pple”

– Submission for this coursework is via blackboard

HPCE / dt10/ 2015 / 1.1

Expectations for coursework

• Coursework is not lab [1]

– You have to manage when, where, and how long you spend on it

– 100% coursework does not mean easy [1]

• Long hours are neither sufficient nor necessary for an A

– Like anything else: some people are just good at it

– But, a good correlation between organisation and marks

• You are expected to be reasonably independent

– This is a masters level course

HPCE / dt10/ 2015 / 1.2

[1] – Though the earlier parts kind of are.

Working together

• The software community has a tradition of sharing

– Many open-source projects, some of which you will rely on

– Lots of forums for discussing problems: stack-overflow, ...

• Approach this work in the same way

– You may encounter the same problems as other students

– Discuss solutions with each other, help each other out

– One-on-one discussions, github issues, whatever

• https://github.com/HPCE/hpce-2014-cw1/blob/master/background-bugs.md

– Give credit or thanks if appropriate: be excellent to each other

• But you have to balance co-operation and competition

– The later courseworks require good ideas and strategies

– Up to you to protect your IP.

HPCE / dt10/ 2015 / 1.3

Plagiarism

• All submitted material must be written by you

– Do not share any code with each other (except within pairs)

• No plagiarism checking software: I just read the code

– Some similarity of structure is expected, but there are limits

– Students are amusingly bad at obfuscation

– You need to be able to explain any code you submit in the oral

– Suspected plagiarism will be passed to the plagiarism committee

• If necessary you may use code from third-party sources

– e.g. open-source projects, samples, stack overflow, ...

– Origin and extent must be very clearly shown

– Need to be able to justify (orally) why it was used

– Should be aware of potential licensing implications

HPCE / dt10/ 2015 / 1.4

Matlab: why?

• Matlab is not really high performance

– Interpreted (though it is JIT compiled these days)

– Dynamically typed

– Poor at loops (even with the JIT)

• But it is high productivity

– The toolboxes can be a massive timesaver

– Development is interactive due to the REPL interface

• REPL = Read-Eval-Print Loop

• Encourages experimentation

• And it can be fast

– Just need to talk to it right

HPCE / dt10/ 2015 / 1.5

All about vectorisation

• Traditional wisdom: “you need to vectorise matlab”

– The best kind of wisdom: it’s actually correct

• But vectorisation is not only useful for matlab

– Memory system: Maximises cache performance

– Execution: Can utilise SIMD units in modern CPUs » SIMD = Single-Instruction Multiple Data: SSE, AVX, MMX, ...

– Scheduling: Reduce dependency tracking overhead

– Parallelism: Vectorised code often implies parallel code

• Principles from matlab vectorisation apply in OpenCL

– Easier to learn them here

HPCE / dt10/ 2015 / 1.6

What can be vectorised?

• The obvious: single for loops

• Life sometimes is easy

• Though watch out...

tic;

x = x .* x;

toc

x = x’ * x;

x=randn(1,1e6);

tic;

for t=1:n

x(t) = x(t) * x(t);

end

toc

HPCE / dt10/ 2015 / 1.7

Less obvious things...

• Vectorised in the weak sense

– One statement is operating on lots of data (not just vectors)

• Making use of a vector mask

– Apply some condition to everything in a vector

– Get mask indicating where the condition is true

– Select just those elements that meet the criterion

x=2:R; % All integers in range 2..R

o=x'*x; % Outer product of vector with itself

% Check which numbers are in the product matrix

mask=ismember(x,o);

res=x(~mask); % select any that aren't

HPCE / dt10/ 2015 / 1.8

mask=true(1,R); % All numbers initially prime

p=2; % Start from the smallest prime

while p < R

% Mark all multiples of the current prime

mask(2*p:p:end)=0;

% Find next number above p that is still marked

p=find(mask(p+1:end),1,'first')+p;

end

primes=find(mask); % Gather indices that are marked

• Even weaker form of vectorisation

– ``find’’ within loop is order dependent – can’t parallelise

• But natively supported by matlab

– ``find’’ is a primitive

– As basic as an add instruction in ``add’’ in x86

– Most of the cost is in scheduling the primitive; execution is cheap

HPCE / dt10/ 2015 / 1.9

More practical: image processing

• Quantisation: reduce the colour depth of images

– e.g. Take 256 level grayscale, produce 1-bit image

HPCE / dt10/ 2015 / 1.10

im=imread('lena-std_512x512.png' );

im=double(rgb2gray(im))/256;

res=zeros(size(im));

for x=1:size(im,1)

for y=1:size(im,2)

res(x,y) = im(x,y) > 0.5;

end

end

imshow(res);

res = im>0.5;

imshow(res);

HPCE / dt10/ 2015 / 1.11

res = im>0.5;

imshow(res);

res = arrayfun(f, im);

imshow(res);

f = @(v)( v>0.5 );

res = f(im);

imshow(res)

Create anonymous function with argument “v”

Define expression as body for anonymous function

Assign function to variable f

Variable f can now be called as a function

Can pass f to other functions

HPCE / dt10/ 2015 / 1.12

More intelligent quantisation

• Dithering: cumulative error due to quantisation is tracked

HPCE / dt10/ 2015 / 1.13

More intelligent quantisation

• Dithering: cumulative error due to quantisation is tracked

• Can it be vectorised ?

acc=0;

for x=1:w

for y=1:h

acc = acc + src(x,y);

quantised = round(acc*(levels-1))/(levels-1);

res(x,y)=quantised;

acc = acc - quantised;

end

end

Loop carried dependency through acc

HPCE / dt10/ 2015 / 1.14

acc=0;

x=1:w;

for y=1:h

acc = acc + src(x,y);

quantised = round(acc*(levels-1))/(levels-1);

res(x,y)=quantised;

acc = acc - quantised;

end

acc=0;

for x=1:w

for y=1:h

acc = acc + src(x,y);

quantised = round(acc*(levels-1))/(levels-1);

res(x,y)=quantised;

acc = acc - quantised;

end

end

HPCE / dt10/ 2015 / 1.15

2D error diffusion

• Attempt to diffuse error both across and down image

• Reduce tendency towards banding effects

HPCE / dt10/ 2015 / 1.16

• More difficult loop carried dependency

• Have a write before read dependency

– Current loop iteration reads from (x,y)

– Writes (x,y+1), (x+1,y), and (x+1,y+1)

– Three constraints per node

for x=1:w-1

for y=1:h-1

desired = src(x,y);

quantised = round(desired*levelsSub1)*invLevelsSub1;

src(x,y)=quantised;

error = desired - quantised;

src(x+1,y) = src(x+1,y) + error*0.4;

src(x,y+1) = src(x,y+1) + error*0.4;

src(x+1,y+1) = src(x+1,y+1) + error*0.2;

end

end

(x,y) (x+1,y)

(x,y+1) (x+1,y+1)

HPCE / dt10/ 2015 / 1.17

Generalise to the full grid

(1,1) (1,2) (1,3) (1,4)

(2,1) (2,2) (2,3) (2,4)

(3,1) (3,2) (3,3) (3,4)

(4,1) (4,2) (4,3) (4,4)

HPCE / dt10/ 2015 / 1.18

Serial execution: y then x

(1,1) (1,2) (1,3) (1,4)

(2,1) (2,2) (2,3) (2,4)

(3,1) (3,2) (3,3) (3,4)

(4,1) (4,2) (4,3) (4,4)

HPCE / dt10/ 2015 / 1.19

Serial execution: x then y

(1,1) (1,2) (1,3) (1,4)

(2,1) (2,2) (2,3) (2,4)

(3,1) (3,2) (3,3) (3,4)

(4,1) (4,2) (4,3) (4,4)

HPCE / dt10/ 2015 / 1.20

Vectorisation: can’t do it along x

(1,1) (1,2) (1,3) (1,4)

(2,1) (2,2) (2,3) (2,4)

(3,1) (3,2) (3,3) (3,4)

(4,1) (4,2) (4,3) (4,4)

HPCE / dt10/ 2015 / 1.21

Vectorisation: can’t do it along y

(1,1) (1,2) (1,3) (1,4)

(2,1) (2,2) (2,3) (2,4)

(3,1) (3,2) (3,3) (3,4)

(4,1) (4,2) (4,3) (4,4)

HPCE / dt10/ 2015 / 1.22

Skewing the loops

(1,1) (1,2) (1,3) (1,4)

(2,1) (2,2) (2,3) (2,4)

(3,1) (3,2) (3,3) (3,4)

(4,1) (4,2) (4,3) (4,4)

HPCE / dt10/ 2015 / 1.23

Or viewed another way

(1,1) (1,2) (1,3) (1,4)

(2,1) (2,2) (2,3) (2,4)

(3,1) (3,2) (3,3) (3,4)

(4,1) (4,2) (4,3) (4,4)

HPCE / dt10/ 2015 / 1.24

So how to do that in matlab?

• C and Fortran compilers may try to do this for you

– Can do a decent job for small loop kernels

– Have difficult detecting when it is safe to apply

– Active research area: polyhedral compilation techniques

• A lot of the time you have to do it yourself

– Matlab isn’t clever enough: not enough info in matlab code

– Must be explicitly handled by programmer in OpenCL

• Compiler is not allowed to do it

– Even in multi-core it crops up

• Some basic techniques to make it easier

HPCE / dt10/ 2015 / 1.25


Recommended