Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
Final Presentation
Parallel Covariance Matrix Creation
Supervisor:Oded Green
Ami Galperin Lior David
2Parallel Covariance Matrix Creation - Final Presentation
Table of Contents - Overview Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
3Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
4Parallel Covariance Matrix Creation - Final Presentation
Project’s Goals
Developing a parallel algorithm for the creation of a covariance matrixCompatibility with Plurality’s HAL platformMaximized parallelization and core utilizationIntegrating the algorithm into Elta’s MVM (Minimum Variance Method) algorithm implementation
April 18, 2010
5Parallel Covariance Matrix Creation - Final Presentation
MVM Algorithm MVM is a modern 2-D spectral estimation algorithm used by Elta’s Synthetic Aperture Radar (SAR).The MVM algorithm:
Improves resolution Removes side lobe artifacts (noise)Reduces speckle compared to what is possible with conventional Fourier transform SAR imaging techniques
One of MVM’s main building blocks is the creation of a covariance matrix
April 18, 2010
6Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Plurality PlatformPlurality’s HyperCore Architecture Line (HAL) family of massively parallel manycore processors features:
Unique task-oriented programming modelNear-serial programmability High performance at low cost per watt per square millimeterUnique shared memory architecture - 2 MB cache size
7Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
8Parallel Covariance Matrix Creation - Final Presentation
Implementing the Naïve Algorithm
April 18, 2010
Implementing the naïve algorithm will give us a greater understanding of the parallelization problem.
Motivation:
9Parallel Covariance Matrix Creation - Final Presentation
The Naïve Algorithm
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
Chip [NxM]
April 18, 2010
10Parallel Covariance Matrix Creation - Final Presentation
The Naïve Algorithm
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
Sub aperture [N1xM1]
11Parallel Covariance Matrix Creation - Final Presentation
The Naïve Algorithm
April 18, 2010
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
12Parallel Covariance Matrix Creation - Final Presentation
The Naïve Algorithm
April 18, 2010
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
C2,4
C3,4
C4,4
C2,3
C3,3
C4,3
C2,3* C3,3* C4,4*C2,2* C3,2* C4,2* C2,4* C3,4* C4,4*
C2,2
C3,2
C4,2
13Parallel Covariance Matrix Creation - Final Presentation
The Naïve Algorithm
April 18, 2010
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
Every Sub-aperture holds its covariance matrix Cov
14Parallel Covariance Matrix Creation - Final Presentation
The Naïve Algorithm
April 18, 2010
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
The covariance matrix Cov is the sum of all Sub-apertures Cov matrixes
1 1N-N*
xx0 0
Cov ~M M
pq pqp q
V V
15Parallel Covariance Matrix Creation - Final Presentation
The Naïve Algorithm
April 18, 2010
Shortcomings
Each multiplication is executed many timesFor a 32x32 chip, the total number of multiplies is 11.4M when the optimal number of multiplications is 208K (x28!)
The naïve algorithm is difficult to parallelize. Two main difficulties:
Simultaneous writing to the same Rcells – requires mutexesMemory cost of holding a Cov matrix for every permutation (each is 250 KB) is too expensive
16Parallel Covariance Matrix Creation - Final Presentation
The Naïve Algorithm
April 18, 2010
Disadvantages
Mutexes - adds complexity Memory space - cache size is only 2 MB
Plurality Platform
The problem requires different solution!
17Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
A Whole different Ball Game!
Our Algorithm
18Parallel Covariance Matrix Creation - Final Presentation
But first …
Before presenting the algorithm there is a need to create a common language for the terms we have created.
April 18, 2010
19Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
20Parallel Covariance Matrix Creation - Final Presentation
Terminology
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
Permutation
M1
M2
Examples• Permutation [1,0] • Permutation [1,1]
21Parallel Covariance Matrix Creation - Final Presentation
Terminology
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
Permutation
M1
M2
Examples• Permutation [1,0] • Permutation [1,1]
22Parallel Covariance Matrix Creation - Final Presentation
Terminology
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
Permutation
M1
M2
Examples• Permutation [1,0] • Permutation [1,1]
23Parallel Covariance Matrix Creation - Final Presentation
Terminology
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
Block
M1
M2Block
24Parallel Covariance Matrix Creation - Final Presentation
Terminology
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
Block
25Parallel Covariance Matrix Creation - Final Presentation
Terminology
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
BNW
BNW
26Parallel Covariance Matrix Creation - Final Presentation
Terminology
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
Shifting
M2
M1Block
Shift only upwardsand leftwards
The block is always inside the shifted window
27Parallel Covariance Matrix Creation - Final Presentation
Terminology
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
Shifting
M2
M1Block
Shift only upwardsand leftwards
The block is always inside the shifted window
28Parallel Covariance Matrix Creation - Final Presentation
Terminology
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
Shifting
M2
M1Block
Shift only upwardsand leftwards
The block is always inside the shifted window
29Parallel Covariance Matrix Creation - Final Presentation
Terminology
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
April 18, 2010
Shifting
M2
M1Block
Shift only upwardsand leftwards
The block is always inside the shifted window
Shift of (0,0) is named Zero iteration
30Parallel Covariance Matrix Creation - Final Presentation
Terminology
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
April 18, 2010
Cov- The covariance matrix[M N, M N]∙ ∙
31Parallel Covariance Matrix Creation - Final Presentation
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
Terminology
April 18, 2010
Rcell
32Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
33Parallel Covariance Matrix Creation - Final Presentation
Our Algorithm – Key Features
April 18, 2010
ParallelEach multiplication is executed once (208k for 32x32 chip)
Memory efficientGeneric
Each Rcell in Cov is calculated by one specific permutation. This enables different permutations to work simultaneously.
Concept:
34Parallel Covariance Matrix Creation - Final Presentation
Our Algorithm (simplified)
April 18, 2010
1. For each permutation (1:313)
1.1 For each legal BNW
1.1.1. Multiply the two multipliers
1.1.2. For each legal shift (including the zero iteration)
1.1.2.1. Add the multiplication product to thematching Rcell in Cov
35Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Our algorithm (simplified)Finding all unique permutations
Iterative algorithm1. Initialize Delta (x,y) set and Permutation(x,y) set2. For each pair of cells (M1,M2) in a N1xM1 matrix
2.1. If |M1-M2| is not in D2.1.1. Add |M1-M2| to D2.1.2. Add (M1,M2) to P
Unique permutation count is 313 ( for Sub-aperture [13x13])
Executed off-line
36Parallel Covariance Matrix Creation - Final Presentation
Our algorithm (simplified)
April 18, 2010
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
Chip [NxM]Cov- The covariance matrix[M N, M N]∙ ∙
37Parallel Covariance Matrix Creation - Final Presentation
Our algorithm (simplified)
April 18, 2010
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
For a given Permutation [1,1]
M2
M1
38Parallel Covariance Matrix Creation - Final Presentation
Our algorithm (simplified)
April 18, 2010
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
There’s a Block
M2
M1Block
39Parallel Covariance Matrix Creation - Final Presentation
Our algorithm (simplified)
April 18, 2010
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
Leagal BNWs for this Block
M2
M1Block
BNW
40Parallel Covariance Matrix Creation - Final Presentation
Our algorithm (simplified)
April 18, 2010
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
For a given BNW
M2
M1Block
41Parallel Covariance Matrix Creation - Final Presentation
Block
Our algorithm (simplified)
April 18, 2010
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4 C4,5 … C4,M
C5,1 C5,2 C5,3 C5,4 C5,5 … C5,M
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
RES=M1 M2* ∙
M2
M1
RES
42Parallel Covariance Matrix Creation - Final Presentation
Our algorithm (simplified)
April 18, 2010
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4
C5,1 C5,2 C5,3 C5,4
… … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
The multipliers Numbering
1 4 7
2 5 8
3 6 9
Block
43Parallel Covariance Matrix Creation - Final Presentation
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
Our algorithm (simplified)
April 18, 2010
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4
C5,1 C5,2 C5,3 C5,4
… … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
The Zero Iteration
1 4 7
2 5 8
3 6 9
Block 5
1
RESRcell (1,5)
RES
Diag(5-1)
Main Diag
44Parallel Covariance Matrix Creation - Final Presentation
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 … R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
Our algorithm (simplified)
April 18, 2010
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4 C3,5 … C3,M
C4,1 C4,2 C4,3 C4,4
C5,1 C5,2 C5,3 C5,4
… … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
Shifting
1 4 7
2 5 8
3 6 9
Block
45Parallel Covariance Matrix Creation - Final Presentation
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
Our algorithm (simplified)
April 18, 2010
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4
C4,1 C4,2 C4,3 C4,4
C5,1 C5,2 C5,3 C5,4
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
Shifting
1 4 7
2 5 8
3 6 9Block
6
2
RES
Rcell (2,6)
Diag(5-1)
Main Diag
RES
RES
46Parallel Covariance Matrix Creation - Final Presentation
Our algorithm (simplified)
April 18, 2010
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3 C3,4
C4,1 C4,2 C4,3 C4,4
C5,1 C5,2 C5,3 C5,4
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
Shifting
1 4 7
2 5 8
3 6 9Block
6
2
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 … R3,M∙N
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
47Parallel Covariance Matrix Creation - Final Presentation
R1,1 R1,2 R1,3 R1,4 R1,5 … R1,M∙N
R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,M∙N
R3,1 R3,2 R3,3 R3,4 R3,5 …
R4,1 R4,2 R4,3 R4,4 R4,5 … R4,M∙N
R5,1 R5,2 R5,3 R5,4 R5,5 … R5,
M N∙
… … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
… RM N,M∙ ∙
N
Our algorithm (simplified)
April 18, 2010
C1,1 C1,2 C1,3 C1,4 C1,5 … C1,M
C2,1 C2,2 C2,3 C2,4 C2,5 … C2,M
C3,1 C3,2 C3,3
C4,1 C4,2 C4,3
C5,1 C5,2 C5,3
… … … … … … …
CN,1 CN,2 CN,3 CN,4 CN,5 … CN,M
Shifting
1 4 7
2 5 8
3 6 9Block
9
5
RES
Diag(5-1)
Main Diag
RES
RES
RES
48Parallel Covariance Matrix Creation - Final Presentation
Our algorithm (simplified)
April 18, 2010
We came across a regularity in the offset of the Rcell coordinates when shifting:
Leftwards (+Sub-ap size, +Sub-ap size)Upwards (+1,+1)
R1,1 R1,2 R1,3 R1,4 R1,5 R1,6 R1,7 … R1,M N∙
R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,7 … R2,M N∙
R3,1 R3,2 R3,3 R3,4 R3,5 R3,6 R3,7 … R3,M N∙
R4,1 R4,2 R4,3 R4,4 R4,5 R4,6 R4,7 s… R4,M N∙
R5,1 R5,2 R5,3 R5,4 R5,5 R5,6 R5,7 … R5, M N∙
R6,1 R6,2 R6,3 R6,4 R6,5 R6,6 R6,7 … R6, M N∙
R7,1 R7,2 R7,3 R7,4 R7,5 R7,6 R7,7 … R7, M N∙
… … … … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
RM N∙ ,6
RM N∙ ,7
…RM N,M N∙ ∙
49Parallel Covariance Matrix Creation - Final Presentation
Our algorithm (simplified)
April 18, 2010
Each color represents a different permutation
R1,1 R1,2 R1,3 R1,4 R1,5 R1,6 R1,7 … R1,M N∙
R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,7 … R2,M N∙
R3,1 R3,2 R3,3 R3,4 R3,5 R3,6 R3,7 … R3,M N∙
R4,1 R4,2 R4,3 R4,4 R4,5 R4,6 R4,7 s… R4,M N∙
R5,1 R5,2 R5,3 R5,4 R5,5 R5,6 R5,7 … R5, M N∙
R6,1 R6,2 R6,3 R6,4 R6,5 R6,6 R6,7 … R6, M N∙
R7,1 R7,2 R7,3 R7,4 R7,5 R7,6 R7,7 … R7, M N∙
… … … … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
RM N∙ ,6
RM N∙ ,7
…RM N,M N∙ ∙
50Parallel Covariance Matrix Creation - Final Presentation
Our Algorithm (simplified)
April 18, 2010
SummaryFor a given permutation:
RES is always written into the same group of Rcells
All on the same diagonalNot necessarily all diagonal cells
There is no overlapping between Rcells of different permutations.The basis for parallelism!Each shift writes to one unique Rcell. Theoretically enables parallelism of Rcell granularity (an instance per Rcell)
R1,1 R1,2 R1,3 R1,4 R1,5 R1,6 R1,7 … R1,M N∙
R2,1 R2,2 R2,3 R2,4 R2,5 R2,6 R2,7 … R2,M N∙
R3,1 R3,2 R3,3 R3,4 R3,5 R3,6 R3,7 … R3,M N∙
R4,1 R4,2 R4,3 R4,4 R4,5 R4,6 R4,7 s… R4,M N∙
R5,1 R5,2 R5,3 R5,4 R5,5 R5,6 R5,7 … R5, M N∙
R6,1 R6,2 R6,3 R6,4 R6,5 R6,6 R6,7 … R6, M N∙
R7,1 R7,2 R7,3 R7,4 R7,5 R7,6 R7,7 … R7, M N∙
… … … … … … … … …
RM N∙ ,1
RM N∙ ,2
RM N∙ ,3
RM N∙ ,4
RM N∙ ,5
RM N∙ ,6
RM N∙ ,7
…RM N,M N∙ ∙
51Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Permutations Execution Times
52Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Permutations Execution Times
Different workload for different permutations, therefore changing the order of permutations’ execution may improve core utilization.
53Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Parallelization Opportunities
Different permutations work simultaneouslyDifferent chips can work simultaneouslyFiner grain parallelism of Rcell granularity (an instance per Rcell)
54Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Platform Comparison
Our algorithm is optimal for shared memory platforms since Cov is shared by all coresWorking on distributed memory platforms will damage its efficiency as a result of communication overhead Plurality provides much higher performance-power utilization than Elta's grid computing
Plurality vs. Distributed Systems
55Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
56Parallel Covariance Matrix Creation - Final Presentation
Reduces calculation at run time by 50%Same tables used for all chips
April 18, 2010
Look-up Tables
Execute many data-independent calculations off-line and storing results as a memory efficient static look-up tables.
Concept:
Advantages:
57Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Holds relevant permutation info:
Permutations Table Look-up tables
Optimal table size: (4 6 + 8 2) bit 313 = ∙ ∙ ∙ 1.5 KBytes
Multipliers’ indexesBlock bordersZero iteration coordinates
58Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Maps each shift to a Rcell
Uses the regularity in the offset of the Rcell coordinates when shifting upwards (+1,+1) or leftwards(+Sub-ap size, +Sub-ap size)
Concept:
Offsets Table Look-up tables
Optimal table size: 2 (13 13 8) bit 313 = ∙ ∙ ∙ ∙ 106 KBytes
59Parallel Covariance Matrix Creation - Final Presentation
Cov is an Hermitian matrix.
April 18, 2010
Concept:Using matrix characteristics to reduce calculations
Important observation:
†R R , ,R i j R j i
Using matrix Characteristics
60Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Highlight:Building Cov’s upper triangle only and, if necessary,generate the lower triangle inexpensively
Advantages:Reduces calculations by half Requires less space for storing the Cov matrixMost eigendecomposition algorithms requires upper triangle only
Using matrix Characteristics
61Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
62Parallel Covariance Matrix Creation - Final Presentation
Results (x86)
April 18, 2010
6 8 16 26 32 33 360
0.02
0.04
0.06
0.08
0.1
0.12
NaiveOurs
Chip Size
Run
Tim
e [s
econ
ds]
Not optimized for x86
Different Chip Sizes
63Parallel Covariance Matrix Creation - Final Presentation
3 4 6 8 11 13 150
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
naiveOurs
Sub-ap Size
Run
Tim
e [s
econ
ds]
April 18, 2010
Results (x86)Different Sub-aparture Sizes
Not optimized for x86
64Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
65Parallel Covariance Matrix Creation - Final Presentation
Elta’s MVM AlgorithmPreliminary Algorithm
April 18, 2010
, ,2D FFT x YK K X YS ��������������
Elta’s Algorithm
32 32,, Fragmentation xX YX Y ��������������
32 32 32 32, , 2D IFFT x xX Y X YS ��������������
32 32 32 32, , x x
MVMX Y X YS MVM ��������������
32 32, , Attachment x
MVMX Y X Y ��������������
Original SAR Image is Segmented into Chips (32X32 Chip) . The chips overlap.
MVM is Applied to Each Chip. The Various Chips are Attached to Each Other and Forms a
Full Size MVM Image
Input image
Output image The chips overlap
66Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
2D-IFFT
INIT
Segmentation
Covarince
Eigenvalues
FFT
Main effort
1
2 FINISH
Attachment
Task Map - MVM
67Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
68Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Plurality’s HyperCore Architecture Line (HAL) family of massively parallel manycore processors includes:
16 to 256 32-bit RISC cores4-64 co-processors that include a Floating Point unit and a Multiplier/Divider. Each co- processor is shared by four RISC processorsShared memory architecture - 2 MB size. No level one cache.Hardware-based scheduler that supports a task-oriented programming modelA cycle accurate simulator that runs on a x86 platformIntegrated into Eclipse IDEAn emulator supporting Linux and Windows native environments
Plurality Platform
69Parallel Covariance Matrix Creation - Final Presentation
Plurality’s Platform
The emulator mimics the behavior of HAL's hardware scheduler while still running on a X86 processor and working on Linux/Windows-based environments.
April 18, 2010
Emulator
No need to change to new hardware and a new programming model The emulator is written in ANSI-C. (almost all compilers can compile it)It comes with a prebuilt Makefile and a Visual Studio solutionThe emulator calls each task with all its required information: its right task instance, right timing, and right core IDHowever, not cycle-accurate!
Advantages
70Parallel Covariance Matrix Creation - Final Presentation
Plurality’s Platform
April 18, 2010
A cycle-accurate hardware simulator, that simulates the exact behavior of real HAL hardware. The simulator is integrated into eclipse IDE, but is very hard to debug with.
Simulator
Cycle accurate simulation.Uses GNU’s well known binutils and GDB debuggerIntegrated into Eclipse IDEEase transition to hardware
Advantages
71Parallel Covariance Matrix Creation - Final Presentation
Plurality’s Platform
April 18, 2010
Implementations
Compilation of the whole MVM algorithm using Plurality's emulatorCompilation of our covariance matrix creation program using Plurality's simulatorUsing the Eclipse development environment to measure the cycle-accurate performance
72Parallel Covariance Matrix Creation - Final Presentation
MotivationOvercoming Plurality’s unimplemented featureAllow manual scheduling in order to preserve processing time
For a given task, limit the number of concurrent instances out of its defined quotaImplemented in Perl
April 18, 2010
Added FeaturesN of M pre-compiler
73Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
The naïve algorithm
74Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
2D-IFFT
INIT
Segmentation
Covarince
Eigenvalues
2D-FFT
FINISH
Attachment
Task Map - MVM on Plurality
X86
X86
Emulator
Simulator
75Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Results (complete MVM on emulator)
2D(I)FFT and eigendecomposition using Intel’s MKL as black-box on the X86Compiled to native x86 code, but not fully optimized
Catego
ry 1
Catego
ry 2
Catego
ry 3
Catego
ry 4
0
1
2
3
4
5
6
Series 1Series 2Series 3
Placeholder
76Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Results (building covariance on the simulator)
2 4 8 16 32 64 128 2560
1
2
3
4
5
6
7
8
9
10
11
Speedup for 61 Permutations
Cores
Cycle
s spe
edup
Chip size: 15x15Sub-Aparture size: 6x6
77Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Results (building covariance on the simulator)
2 4 8 16 32 64 128 2560123456789
10111213141516171819
Speedup for 113 Permutations
Cores
Cycle
s spe
edup
Chip size: 20x20Sub-Aparture size: 8x8
78Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
79Parallel Covariance Matrix Creation - Final Presentation
Future Projects
April 18, 2010
Completing MVM on Plurality
Implement a parallel algorithm for finding eigenvalues and vectors of a dense Hermitian matrix2D(I)FFT on Plurality using Plurality’s 1-D LibraryTask map Optimizations
80Parallel Covariance Matrix Creation - Final Presentation
Solving The Eigenvalues Problem
High complexityMany Algorithems: QR, SVD, D&C, Jacobi, etc.Many OTS solutions: Intel, AMD, IBM, GNU, LAPACK, NAG, FEAST, etc.Shared memory ∩ Parallel ∩ Open source C = ф
April 18, 2010
81Parallel Covariance Matrix Creation - Final Presentation
MRRR (Multiple Relatively Robust Representations)
Main features:Fast – O(n2) for nxn MatrixParallelMemory efficient – O(n2) for nxn MatrixComplex data structuresImplementation unavailable
Optimal for plurality’s Platform
April 18, 2010
82Parallel Covariance Matrix Creation - Final Presentation
Table of Contents Introduction Building the covariance matrix
The naïve algorithm Our algorithm
Terminology The Algorithm Optimizations Results
MVM on Plurality The MVM algorithm Plurality Platform Results
Future Projects Conclusions
April 18, 2010
83Parallel Covariance Matrix Creation - Final Presentation
Opportunities
April 18, 2010
Our algorithm is unique: no parallel solution has been available to date. This solution may be applied to other signal processing problemsImplementation of MRRR is possible, therefore, enabling the complete MVM algorithm to work on plurality's platformUsing our solution on plurality's platform may be very appealing since plurality provides higher performance-power utilization than Grid Computing and faster run time
84Parallel Covariance Matrix Creation - Final Presentation
Practical Implications
Plurality’s low power platform may enable integrating SAR
On satellitesOn Unmanned Aerial Vehicles (UAV’s)More implications …
April 18, 2010
87Parallel Covariance Matrix Creation - Final Presentation
Elta’s MVM AlgorithmAssembling SAR radar picture consists of 2 phases:
April 18, 2010
DATA ManipulationRMC, Adaptive Pre-Sum, MOCOMP,
Autofocus, Polar to Rectangular Interpolation
Filtering and2D IFFT
MVMProcess
Incoming radarEchoes SAR Image2D FFT
1. Conventional SAR
2. MVM method in SAR
Identify Target of InterestUpon a SAR Image
Obtain Virtual SARRaw DATA Corresponding
to The Selected Target
MVM SAR Imageof the selected target
Elta’s MVM Algorithm
Preliminary Algorithm
The MVM algorithm
88Parallel Covariance Matrix Creation - Final Presentation
Greatly reduces calculation at run timeSame table used for all chips
April 18, 2010
Our algorithm (Optimizations)
Concepts:Execute many calculations in advance, saving them in a memory efficient static look-up tables.
Look-up tables
Advantages:
Our algorithm
89Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Our algorithm (Optimizations)Look-up tables Permutation Table
M1x M1y M2x M2y Bx By REFx REFy
1 … … … … … … … …
… … … … … … … … …
… … … … … … … … …
313 … … … … … … … …
Holds relevant info for permutation
Our algorithm
90Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Our algorithm (Optimizations)Look-up tables Permutation Table
[M1x, M1y] are coordinates of first multiplier[M1x, M1y] are coordinates of second multiplierBx is the number of rows of the permutation blockBy is the number of cols of the permutation block [REFx, REFy] are the coordinates of the pixel at REF matrix
(at Zero iteration)
Optimal table size: (4*6+8*2)bit*313=1.565KBytes
Our algorithm
91Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Our algorithm (Optimizations)Look-up tables Offsets Table
Maps each shift to a pixel
We came across a regularity in the offset of the pixel coordinates when shifting upwards (+1,+1) or leftwards(+Sub-ap size, +Sub-ap size)
Concept:
Our algorithm
92Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Our algorithm (Optimizations)Look-up tables Offsets Table
First we create a general Matrix containing all possible pixel offsets
Matrix[i,j]- the offset when shifting i steps upwards and j steps leftwards
Table’s construction156
143
130
117
104
91 78 65 52 39 26 13 0
157
144
131
118
105
92 79 66 53 40 27 14 1
158
145
132
119
106
93 80 67 54 41 28 15 2
159
146
133
120
107
94 81 68 55 42 29 16 3
160
147
134
121
108
95 82 69 56 43 30 17 4
161
148
135
122
109
96 83 70 57 44 31 18 5
162
149
136
123
110
97 84 71 58 45 32 19 6
163
150
137
124
111
98 85 72 59 46 33 20 7
164
151
138
125
112
99 86 73 60 47 34 21 8
165
152
139
126
113
100
87 74 61 48 35 22 9
166
153
140
127
114
101
88 75 62 49 36 23 10
167
154
141
128
115
102
89 76 63 50 37 24 11
168
155
142
129
116
103
90 77 64 51 38 25 12
Our algorithm
93Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Our algorithm (Optimizations)Look-up tables Offsets Table
Then, we add each permutation’s Zero Iteration coordinates (x,y) to the matrix to form each permutation offsets table
Table’s construction
Coodrszero-iteration (x,y) +
156
143
130
117
104
91 78 65 52 39 26 13 0
157
144
131
118
105
92 79 66 53 40 27 14 1
158
145
132
119
106
93 80 67 54 41 28 15 2
159
146
133
120
107
94 81 68 55 42 29 16 3
160
147
134
121
108
95 82 69 56 43 30 17 4
161
148
135
122
109
96 83 70 57 44 31 18 5
162
149
136
123
110
97 84 71 58 45 32 19 6
163
150
137
124
111
98 85 72 59 46 33 20 7
164
151
138
125
112
99 86 73 60 47 34 21 8
165
152
139
126
113
100
87 74 61 48 35 22 9
166
153
140
127
114
101
88 75 62 49 36 23 10
167
154
141
128
115
102
89 76 63 50 37 24 11
168
155
142
129
116
103
90 77 64 51 38 25 12
Our algorithm
94Parallel Covariance Matrix Creation - Final PresentationApril 18, 2010
Our algorithm (Optimizations)Look-up tables Offsets TableTable’s construction
Coodrszero-iteration (x,y) +
156
143
130
117
104
91 78 65 52 39 26 13 0
157
144
131
118
105
92 79 66 53 40 27 14 1
158
145
132
119
106
93 80 67 54 41 28 15 2
159
146
133
120
107
94 81 68 55 42 29 16 3
160
147
134
121
108
95 82 69 56 43 30 17 4
161
148
135
122
109
96 83 70 57 44 31 18 5
162
149
136
123
110
97 84 71 58 45 32 19 6
163
150
137
124
111
98 85 72 59 46 33 20 7
164
151
138
125
112
99 86 73 60 47 34 21 8
165
152
139
126
113
100
87 74 61 48 35 22 9
166
153
140
127
114
101
88 75 62 49 36 23 10
167
154
141
128
115
102
89 76 63 50 37 24 11
168
155
142
129
116
103
90 77 64 51 38 25 12
313X
313X
Our algorithm
Optimal table size: (13*13*8)bit*313=52.9KBytes