+ All Categories
Home > Documents > Supervisor: Oded Green Ami Galperin Lior David. Introduction Building the covariance matrix The...

Supervisor: Oded Green Ami Galperin Lior David. Introduction Building the covariance matrix The...

Date post: 19-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
95
Final Presentatio n Parallel Covariance Matrix Creation Supervisor: Oded Green Ami Galperin Lior David
Transcript

Final Presentation

Parallel Covariance Matrix Creation

Supervisor:Oded Green

Ami Galperin Lior David

2Parallel Covariance Matrix Creation - Final Presentation

Table of Contents - Overview Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

3Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

4Parallel Covariance Matrix Creation - Final Presentation

Project’s Goals

Developing a parallel algorithm for the creation of a covariance matrixCompatibility with Plurality’s HAL platformMaximized parallelization and core utilizationIntegrating the algorithm into Elta’s MVM (Minimum Variance Method) algorithm implementation

April 18, 2010

5Parallel Covariance Matrix Creation - Final Presentation

MVM Algorithm MVM is a modern 2-D spectral estimation algorithm used by Elta’s Synthetic Aperture Radar (SAR).The MVM algorithm:

Improves resolution Removes side lobe artifacts (noise)Reduces speckle compared to what is possible with conventional Fourier transform SAR imaging techniques

One of MVM’s main building blocks is the creation of a covariance matrix

April 18, 2010

6Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Plurality PlatformPlurality’s HyperCore Architecture Line (HAL) family of massively parallel manycore processors features:

Unique task-oriented programming modelNear-serial programmability High performance at low cost per watt per square millimeterUnique shared memory architecture - 2 MB cache size

7Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

8Parallel Covariance Matrix Creation - Final Presentation

Implementing the Naïve Algorithm

April 18, 2010

Implementing the naïve algorithm will give us a greater understanding of the parallelization problem.

Motivation:

9Parallel Covariance Matrix Creation - Final Presentation

The Naïve Algorithm

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

Chip [NxM]

April 18, 2010

10Parallel Covariance Matrix Creation - Final Presentation

The Naïve Algorithm

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

Sub aperture [N1xM1]

11Parallel Covariance Matrix Creation - Final Presentation

The Naïve Algorithm

April 18, 2010

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

12Parallel Covariance Matrix Creation - Final Presentation

The Naïve Algorithm

April 18, 2010

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

C2,4

C3,4

C4,4

C2,3

C3,3

C4,3

C2,3* C3,3* C4,4*C2,2* C3,2* C4,2* C2,4* C3,4* C4,4*

C2,2

C3,2

C4,2

13Parallel Covariance Matrix Creation - Final Presentation

The Naïve Algorithm

April 18, 2010

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

Every Sub-aperture holds its covariance matrix Cov

14Parallel Covariance Matrix Creation - Final Presentation

The Naïve Algorithm

April 18, 2010

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

The covariance matrix Cov is the sum of all Sub-apertures Cov matrixes

1 1N-N*

xx0 0

Cov ~M M

pq pqp q

V V

15Parallel Covariance Matrix Creation - Final Presentation

The Naïve Algorithm

April 18, 2010

Shortcomings

Each multiplication is executed many timesFor a 32x32 chip, the total number of multiplies is 11.4M when the optimal number of multiplications is 208K (x28!)

The naïve algorithm is difficult to parallelize. Two main difficulties:

Simultaneous writing to the same Rcells – requires mutexesMemory cost of holding a Cov matrix for every permutation (each is 250 KB) is too expensive

16Parallel Covariance Matrix Creation - Final Presentation

The Naïve Algorithm

April 18, 2010

Disadvantages

Mutexes - adds complexity Memory space - cache size is only 2 MB

Plurality Platform

The problem requires different solution!

17Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

A Whole different Ball Game!

Our Algorithm

18Parallel Covariance Matrix Creation - Final Presentation

But first …

Before presenting the algorithm there is a need to create a common language for the terms we have created.

April 18, 2010

19Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

20Parallel Covariance Matrix Creation - Final Presentation

Terminology

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

Permutation

M1

M2

Examples• Permutation [1,0] • Permutation [1,1]

21Parallel Covariance Matrix Creation - Final Presentation

Terminology

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

Permutation

M1

M2

Examples• Permutation [1,0] • Permutation [1,1]

22Parallel Covariance Matrix Creation - Final Presentation

Terminology

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

Permutation

M1

M2

Examples• Permutation [1,0] • Permutation [1,1]

23Parallel Covariance Matrix Creation - Final Presentation

Terminology

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

Block

M1

M2Block

24Parallel Covariance Matrix Creation - Final Presentation

Terminology

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

Block

25Parallel Covariance Matrix Creation - Final Presentation

Terminology

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

BNW

BNW

26Parallel Covariance Matrix Creation - Final Presentation

Terminology

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

Shifting

M2

M1Block

Shift only upwardsand leftwards

The block is always inside the shifted window

27Parallel Covariance Matrix Creation - Final Presentation

Terminology

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

Shifting

M2

M1Block

Shift only upwardsand leftwards

The block is always inside the shifted window

28Parallel Covariance Matrix Creation - Final Presentation

Terminology

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

Shifting

M2

M1Block

Shift only upwardsand leftwards

The block is always inside the shifted window

29Parallel Covariance Matrix Creation - Final Presentation

Terminology

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

April 18, 2010

Shifting

M2

M1Block

Shift only upwardsand leftwards

The block is always inside the shifted window

Shift of (0,0) is named Zero iteration

30Parallel Covariance Matrix Creation - Final Presentation

Terminology

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

April 18, 2010

Cov- The covariance matrix[M N, M N]∙ ∙

31Parallel Covariance Matrix Creation - Final Presentation

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

Terminology

April 18, 2010

Rcell

32Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

33Parallel Covariance Matrix Creation - Final Presentation

Our Algorithm – Key Features

April 18, 2010

ParallelEach multiplication is executed once (208k for 32x32 chip)

Memory efficientGeneric

Each Rcell in Cov is calculated by one specific permutation. This enables different permutations to work simultaneously.

Concept:

34Parallel Covariance Matrix Creation - Final Presentation

Our Algorithm (simplified)

April 18, 2010

1. For each permutation (1:313)

1.1 For each legal BNW

1.1.1. Multiply the two multipliers

1.1.2. For each legal shift (including the zero iteration)

1.1.2.1. Add the multiplication product to thematching Rcell in Cov

35Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Our algorithm (simplified)Finding all unique permutations

Iterative algorithm1. Initialize Delta (x,y) set and Permutation(x,y) set2. For each pair of cells (M1,M2) in a N1xM1 matrix

2.1. If |M1-M2| is not in D2.1.1. Add |M1-M2| to D2.1.2. Add (M1,M2) to P

Unique permutation count is 313 ( for Sub-aperture [13x13])

Executed off-line

36Parallel Covariance Matrix Creation - Final Presentation

Our algorithm (simplified)

April 18, 2010

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

Chip [NxM]Cov- The covariance matrix[M N, M N]∙ ∙

37Parallel Covariance Matrix Creation - Final Presentation

Our algorithm (simplified)

April 18, 2010

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

For a given Permutation [1,1]

M2

M1

38Parallel Covariance Matrix Creation - Final Presentation

Our algorithm (simplified)

April 18, 2010

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

There’s a Block

M2

M1Block

39Parallel Covariance Matrix Creation - Final Presentation

Our algorithm (simplified)

April 18, 2010

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

Leagal BNWs for this Block

M2

M1Block

BNW

40Parallel Covariance Matrix Creation - Final Presentation

Our algorithm (simplified)

April 18, 2010

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

For a given BNW

M2

M1Block

41Parallel Covariance Matrix Creation - Final Presentation

Block

Our algorithm (simplified)

April 18, 2010

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M

C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

RES=M1 M2* ∙

M2

M1

RES

42Parallel Covariance Matrix Creation - Final Presentation

Our algorithm (simplified)

April 18, 2010

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4

C5,1 C5,2 C5,3 C5,4

… … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

The multipliers Numbering

1 4 7

2 5 8

3 6 9

Block

43Parallel Covariance Matrix Creation - Final Presentation

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

Our algorithm (simplified)

April 18, 2010

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4

C5,1 C5,2 C5,3 C5,4

… … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

The Zero Iteration

1 4 7

2 5 8

3 6 9

Block 5

1

RESRcell (1,5)

RES

Diag(5-1)

Main Diag

44Parallel Covariance Matrix Creation - Final Presentation

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

Our algorithm (simplified)

April 18, 2010

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M

C4,1 C4,2 C4,3 C4,4

C5,1 C5,2 C5,3 C5,4

… … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

Shifting

1 4 7

2 5 8

3 6 9

Block

45Parallel Covariance Matrix Creation - Final Presentation

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

Our algorithm (simplified)

April 18, 2010

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4

C4,1 C4,2 C4,3 C4,4

C5,1 C5,2 C5,3 C5,4

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

Shifting

1 4 7

2 5 8

3 6 9Block

6

2

RES

Rcell (2,6)

Diag(5-1)

Main Diag

RES

RES

46Parallel Covariance Matrix Creation - Final Presentation

Our algorithm (simplified)

April 18, 2010

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3 C3,4

C4,1 C4,2 C4,3 C4,4

C5,1 C5,2 C5,3 C5,4

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

Shifting

1 4 7

2 5 8

3 6 9Block

6

2

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

47Parallel Covariance Matrix Creation - Final Presentation

R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N

R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,M∙N

R3,1 R3,2 R3,3 R3,4 R3,5 …

R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N

R5,1 R5,2 R5,3 R5,4 R5,5 … R5,

M N∙

… … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

… RM N,M∙ ∙

N

Our algorithm (simplified)

April 18, 2010

C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M

C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M

C3,1 C3,2 C3,3

C4,1 C4,2 C4,3

C5,1 C5,2 C5,3

… … … … … … …

CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M

Shifting

1 4 7

2 5 8

3 6 9Block

9

5

RES

Diag(5-1)

Main Diag

RES

RES

RES

48Parallel Covariance Matrix Creation - Final Presentation

Our algorithm (simplified)

April 18, 2010

We came across a regularity in the offset of the Rcell coordinates when shifting:

Leftwards (+Sub-ap size, +Sub-ap size)Upwards (+1,+1)

R1,1 R1,2 R1,3 R1,4 R1,5 R1,6 R1,7 … R1,M N∙

R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,7 … R2,M N∙

R3,1 R3,2 R3,3 R3,4 R3,5 R3,6 R3,7 … R3,M N∙

R4,1 R4,2 R4,3 R4,4 R4,5 R4,6 R4,7 s… R4,M N∙

R5,1 R5,2 R5,3 R5,4 R5,5 R5,6 R5,7 … R5, M N∙

R6,1 R6,2 R6,3 R6,4 R6,5 R6,6 R6,7 … R6, M N∙

R7,1 R7,2 R7,3 R7,4 R7,5 R7,6 R7,7 … R7, M N∙

… … … … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

RM N∙ ,6

RM N∙ ,7

…RM N,M N∙ ∙

49Parallel Covariance Matrix Creation - Final Presentation

Our algorithm (simplified)

April 18, 2010

Each color represents a different permutation

R1,1 R1,2 R1,3 R1,4 R1,5 R1,6 R1,7 … R1,M N∙

R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,7 … R2,M N∙

R3,1 R3,2 R3,3 R3,4 R3,5 R3,6 R3,7 … R3,M N∙

R4,1 R4,2 R4,3 R4,4 R4,5 R4,6 R4,7 s… R4,M N∙

R5,1 R5,2 R5,3 R5,4 R5,5 R5,6 R5,7 … R5, M N∙

R6,1 R6,2 R6,3 R6,4 R6,5 R6,6 R6,7 … R6, M N∙

R7,1 R7,2 R7,3 R7,4 R7,5 R7,6 R7,7 … R7, M N∙

… … … … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

RM N∙ ,6

RM N∙ ,7

…RM N,M N∙ ∙

50Parallel Covariance Matrix Creation - Final Presentation

Our Algorithm (simplified)

April 18, 2010

SummaryFor a given permutation:

RES is always written into the same group of Rcells

All on the same diagonalNot necessarily all diagonal cells

There is no overlapping between Rcells of different permutations.The basis for parallelism!Each shift writes to one unique Rcell. Theoretically enables parallelism of Rcell granularity (an instance per Rcell)

R1,1 R1,2 R1,3 R1,4 R1,5 R1,6 R1,7 … R1,M N∙

R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,7 … R2,M N∙

R3,1 R3,2 R3,3 R3,4 R3,5 R3,6 R3,7 … R3,M N∙

R4,1 R4,2 R4,3 R4,4 R4,5 R4,6 R4,7 s… R4,M N∙

R5,1 R5,2 R5,3 R5,4 R5,5 R5,6 R5,7 … R5, M N∙

R6,1 R6,2 R6,3 R6,4 R6,5 R6,6 R6,7 … R6, M N∙

R7,1 R7,2 R7,3 R7,4 R7,5 R7,6 R7,7 … R7, M N∙

… … … … … … … … …

RM N∙ ,1

RM N∙ ,2

RM N∙ ,3

RM N∙ ,4

RM N∙ ,5

RM N∙ ,6

RM N∙ ,7

…RM N,M N∙ ∙

51Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Permutations Execution Times

52Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Permutations Execution Times

Different workload for different permutations, therefore changing the order of permutations’ execution may improve core utilization.

53Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Parallelization Opportunities

Different permutations work simultaneouslyDifferent chips can work simultaneouslyFiner grain parallelism of Rcell granularity (an instance per Rcell)

54Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Platform Comparison

Our algorithm is optimal for shared memory platforms since Cov is shared by all coresWorking on distributed memory platforms will damage its efficiency as a result of communication overhead Plurality provides much higher performance-power utilization than Elta's grid computing

Plurality vs. Distributed Systems

55Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

56Parallel Covariance Matrix Creation - Final Presentation

Reduces calculation at run time by 50%Same tables used for all chips

April 18, 2010

Look-up Tables

Execute many data-independent calculations off-line and storing results as a memory efficient static look-up tables.

Concept:

Advantages:

57Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Holds relevant permutation info:

Permutations Table Look-up tables

Optimal table size: (4 6 + 8 2) bit 313 = ∙ ∙ ∙ 1.5 KBytes

Multipliers’ indexesBlock bordersZero iteration coordinates

58Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Maps each shift to a Rcell

Uses the regularity in the offset of the Rcell coordinates when shifting upwards (+1,+1) or leftwards(+Sub-ap size, +Sub-ap size)

Concept:

Offsets Table Look-up tables

Optimal table size: 2 (13 13 8) bit 313 = ∙ ∙ ∙ ∙ 106 KBytes

59Parallel Covariance Matrix Creation - Final Presentation

Cov is an Hermitian matrix.

April 18, 2010

Concept:Using matrix characteristics to reduce calculations

Important observation:

†R R , ,R i j R j i

Using matrix Characteristics

60Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Highlight:Building Cov’s upper triangle only and, if necessary,generate the lower triangle inexpensively

Advantages:Reduces calculations by half Requires less space for storing the Cov matrixMost eigendecomposition algorithms requires upper triangle only

Using matrix Characteristics

61Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

62Parallel Covariance Matrix Creation - Final Presentation

Results (x86)

April 18, 2010

6 8 16 26 32 33 360

0.02

0.04

0.06

0.08

0.1

0.12

NaiveOurs

Chip Size

Run

Tim

e [s

econ

ds]

Not optimized for x86

Different Chip Sizes

63Parallel Covariance Matrix Creation - Final Presentation

3 4 6 8 11 13 150

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

naiveOurs

Sub-ap Size

Run

Tim

e [s

econ

ds]

April 18, 2010

Results (x86)Different Sub-aparture Sizes

Not optimized for x86

64Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

65Parallel Covariance Matrix Creation - Final Presentation

Elta’s MVM AlgorithmPreliminary Algorithm

April 18, 2010

, ,2D FFT x YK K X YS ��������������

Elta’s Algorithm

32 32,, Fragmentation xX YX Y ��������������

32 32 32 32, , 2D IFFT x xX Y X YS ��������������

32 32 32 32, , x x

MVMX Y X YS MVM ��������������

32 32, , Attachment x

MVMX Y X Y ��������������

Original SAR Image is Segmented into Chips (32X32 Chip) . The chips overlap.

MVM is Applied to Each Chip. The Various Chips are Attached to Each Other and Forms a

Full Size MVM Image

Input image

Output image The chips overlap

66Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

2D-IFFT

INIT

Segmentation

Covarince

Eigenvalues

FFT

Main effort

1

2 FINISH

Attachment

Task Map - MVM

67Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

68Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Plurality’s HyperCore Architecture Line (HAL) family of massively parallel manycore processors includes:

16 to 256 32-bit RISC cores4-64 co-processors that include a Floating Point unit and a Multiplier/Divider. Each co- processor is shared by four RISC processorsShared memory architecture - 2 MB size. No level one cache.Hardware-based scheduler that supports a task-oriented programming modelA cycle accurate simulator that runs on a x86 platformIntegrated into Eclipse IDEAn emulator supporting Linux and Windows native environments

Plurality Platform

69Parallel Covariance Matrix Creation - Final Presentation

Plurality’s Platform

The emulator mimics the behavior of HAL's hardware scheduler while still running on a X86 processor and working on Linux/Windows-based environments.

April 18, 2010

Emulator

No need to change to new hardware and a new programming model The emulator is written in ANSI-C. (almost all compilers can compile it)It comes with a prebuilt Makefile and a Visual Studio solutionThe emulator calls each task with all its required information: its right task instance, right timing, and right core IDHowever, not cycle-accurate!

Advantages

70Parallel Covariance Matrix Creation - Final Presentation

Plurality’s Platform

April 18, 2010

A cycle-accurate hardware simulator, that simulates the exact behavior of real HAL hardware. The simulator is integrated into eclipse IDE, but is very hard to debug with.

Simulator

Cycle accurate simulation.Uses GNU’s well known binutils and GDB debuggerIntegrated into Eclipse IDEEase transition to hardware

Advantages

71Parallel Covariance Matrix Creation - Final Presentation

Plurality’s Platform

April 18, 2010

Implementations

Compilation of the whole MVM algorithm using Plurality's emulatorCompilation of our covariance matrix creation program using Plurality's simulatorUsing the Eclipse development environment to measure the cycle-accurate performance

72Parallel Covariance Matrix Creation - Final Presentation

MotivationOvercoming Plurality’s unimplemented featureAllow manual scheduling in order to preserve processing time

For a given task, limit the number of concurrent instances out of its defined quotaImplemented in Perl

April 18, 2010

Added FeaturesN of M pre-compiler

73Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

The naïve algorithm

74Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

2D-IFFT

INIT

Segmentation

Covarince

Eigenvalues

2D-FFT

FINISH

Attachment

Task Map - MVM on Plurality

X86

X86

Emulator

Simulator

75Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Results (complete MVM on emulator)

2D(I)FFT and eigendecomposition using Intel’s MKL as black-box on the X86Compiled to native x86 code, but not fully optimized

Catego

ry 1

Catego

ry 2

Catego

ry 3

Catego

ry 4

0

1

2

3

4

5

6

Series 1Series 2Series 3

Placeholder

76Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Results (building covariance on the simulator)

2 4 8 16 32 64 128 2560

1

2

3

4

5

6

7

8

9

10

11

Speedup for 61 Permutations

Cores

Cycle

s spe

edup

Chip size: 15x15Sub-Aparture size: 6x6

77Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Results (building covariance on the simulator)

2 4 8 16 32 64 128 2560123456789

10111213141516171819

Speedup for 113 Permutations

Cores

Cycle

s spe

edup

Chip size: 20x20Sub-Aparture size: 8x8

78Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

79Parallel Covariance Matrix Creation - Final Presentation

Future Projects

April 18, 2010

Completing MVM on Plurality

Implement a parallel algorithm for finding eigenvalues and vectors of a dense Hermitian matrix2D(I)FFT on Plurality using Plurality’s 1-D LibraryTask map Optimizations

80Parallel Covariance Matrix Creation - Final Presentation

Solving The Eigenvalues Problem

High complexityMany Algorithems: QR, SVD, D&C, Jacobi, etc.Many OTS solutions: Intel, AMD, IBM, GNU, LAPACK, NAG, FEAST, etc.Shared memory ∩ Parallel ∩ Open source C = ф

April 18, 2010

81Parallel Covariance Matrix Creation - Final Presentation

MRRR (Multiple Relatively Robust Representations)

Main features:Fast – O(n2) for nxn MatrixParallelMemory efficient – O(n2) for nxn MatrixComplex data structuresImplementation unavailable

Optimal for plurality’s Platform

April 18, 2010

82Parallel Covariance Matrix Creation - Final Presentation

Table of Contents Introduction Building the covariance matrix

The naïve algorithm Our algorithm

Terminology The Algorithm Optimizations Results

MVM on Plurality The MVM algorithm Plurality Platform Results

Future Projects Conclusions

April 18, 2010

83Parallel Covariance Matrix Creation - Final Presentation

Opportunities

April 18, 2010

Our algorithm is unique: no parallel solution has been available to date. This solution may be applied to other signal processing problemsImplementation of MRRR is possible, therefore, enabling the complete MVM algorithm to work on plurality's platformUsing our solution on plurality's platform may be very appealing since plurality provides higher performance-power utilization than Grid Computing and faster run time

84Parallel Covariance Matrix Creation - Final Presentation

Practical Implications

Plurality’s low power platform may enable integrating SAR

On satellitesOn Unmanned Aerial Vehicles (UAV’s)More implications …

April 18, 2010

85Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Thank you

86Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Back-up Slides

87Parallel Covariance Matrix Creation - Final Presentation

Elta’s MVM AlgorithmAssembling SAR radar picture consists of 2 phases:

April 18, 2010

DATA ManipulationRMC, Adaptive Pre-Sum, MOCOMP,

Autofocus, Polar to Rectangular Interpolation

Filtering and2D IFFT

MVMProcess

Incoming radarEchoes SAR Image2D FFT

1. Conventional SAR

2. MVM method in SAR

Identify Target of InterestUpon a SAR Image

Obtain Virtual SARRaw DATA Corresponding

to The Selected Target

MVM SAR Imageof the selected target

Elta’s MVM Algorithm

Preliminary Algorithm

The MVM algorithm

88Parallel Covariance Matrix Creation - Final Presentation

Greatly reduces calculation at run timeSame table used for all chips

April 18, 2010

Our algorithm (Optimizations)

Concepts:Execute many calculations in advance, saving them in a memory efficient static look-up tables.

Look-up tables

Advantages:

Our algorithm

89Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Our algorithm (Optimizations)Look-up tables Permutation Table

M1x M1y M2x M2y Bx By REFx REFy

1 … … … … … … … …

… … … … … … … … …

… … … … … … … … …

313 … … … … … … … …

Holds relevant info for permutation

Our algorithm

90Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Our algorithm (Optimizations)Look-up tables Permutation Table

[M1x, M1y] are coordinates of first multiplier[M1x, M1y] are coordinates of second multiplierBx is the number of rows of the permutation blockBy is the number of cols of the permutation block [REFx, REFy] are the coordinates of the pixel at REF matrix

(at Zero iteration)

Optimal table size: (4*6+8*2)bit*313=1.565KBytes

Our algorithm

91Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Our algorithm (Optimizations)Look-up tables Offsets Table

Maps each shift to a pixel

We came across a regularity in the offset of the pixel coordinates when shifting upwards (+1,+1) or leftwards(+Sub-ap size, +Sub-ap size)

Concept:

Our algorithm

92Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Our algorithm (Optimizations)Look-up tables Offsets Table

First we create a general Matrix containing all possible pixel offsets

Matrix[i,j]- the offset when shifting i steps upwards and j steps leftwards

Table’s construction156

143

130

117

104

91 78 65 52 39 26 13 0

157

144

131

118

105

92 79 66 53 40 27 14 1

158

145

132

119

106

93 80 67 54 41 28 15 2

159

146

133

120

107

94 81 68 55 42 29 16 3

160

147

134

121

108

95 82 69 56 43 30 17 4

161

148

135

122

109

96 83 70 57 44 31 18 5

162

149

136

123

110

97 84 71 58 45 32 19 6

163

150

137

124

111

98 85 72 59 46 33 20 7

164

151

138

125

112

99 86 73 60 47 34 21 8

165

152

139

126

113

100

87 74 61 48 35 22 9

166

153

140

127

114

101

88 75 62 49 36 23 10

167

154

141

128

115

102

89 76 63 50 37 24 11

168

155

142

129

116

103

90 77 64 51 38 25 12

Our algorithm

93Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Our algorithm (Optimizations)Look-up tables Offsets Table

Then, we add each permutation’s Zero Iteration coordinates (x,y) to the matrix to form each permutation offsets table

Table’s construction

Coodrszero-iteration (x,y) +

156

143

130

117

104

91 78 65 52 39 26 13 0

157

144

131

118

105

92 79 66 53 40 27 14 1

158

145

132

119

106

93 80 67 54 41 28 15 2

159

146

133

120

107

94 81 68 55 42 29 16 3

160

147

134

121

108

95 82 69 56 43 30 17 4

161

148

135

122

109

96 83 70 57 44 31 18 5

162

149

136

123

110

97 84 71 58 45 32 19 6

163

150

137

124

111

98 85 72 59 46 33 20 7

164

151

138

125

112

99 86 73 60 47 34 21 8

165

152

139

126

113

100

87 74 61 48 35 22 9

166

153

140

127

114

101

88 75 62 49 36 23 10

167

154

141

128

115

102

89 76 63 50 37 24 11

168

155

142

129

116

103

90 77 64 51 38 25 12

Our algorithm

94Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010

Our algorithm (Optimizations)Look-up tables Offsets TableTable’s construction

Coodrszero-iteration (x,y) +

156

143

130

117

104

91 78 65 52 39 26 13 0

157

144

131

118

105

92 79 66 53 40 27 14 1

158

145

132

119

106

93 80 67 54 41 28 15 2

159

146

133

120

107

94 81 68 55 42 29 16 3

160

147

134

121

108

95 82 69 56 43 30 17 4

161

148

135

122

109

96 83 70 57 44 31 18 5

162

149

136

123

110

97 84 71 58 45 32 19 6

163

150

137

124

111

98 85 72 59 46 33 20 7

164

151

138

125

112

99 86 73 60 47 34 21 8

165

152

139

126

113

100

87 74 61 48 35 22 9

166

153

140

127

114

101

88 75 62 49 36 23 10

167

154

141

128

115

102

89 76 63 50 37 24 11

168

155

142

129

116

103

90 77 64 51 38 25 12

313X

313X

Our algorithm

Optimal table size: (13*13*8)bit*313=52.9KBytes

95Parallel Covariance Matrix Creation - Final Presentation

REF is an Hermitian matrix.

April 18, 2010

Our algorithm (Optimizations)

Concept:Using matrix characteristics to reduce calculations

Using Matrix Characteristics

Important observation:

†R R

Our algorithm

, ,R i j R j i


Recommended