Matrix Multiplication
Chapter I – Matrix Multiplication
By Gokturk Poyrazoglu
The State University of New York at Buffalo – BEST Group – Winter Lecture Series
Outline
1. Basic Algorithms and Notation
2. Structure and Efficiency
3. Block Matrices
4. Matrix-Vector Products
5. Parallel Matrix Multiplication
Matrix Notation
Matrix Operations
Transposition :
Addition :
Scalar-matrix Multiplication :
Matrix-matrix Multiplication :
Pointwise Multiplication :
Pointwise Division :
Vector Notation
Column Vector :
Row Vector :
Vector Operations
Scalar-vector Multiplication :
Vector Addition :
Inner Product (dot product) :
Vector Update :
Pointwise Vector Multiplication :
Pointwise Vector Division :
Dot Product Algorithm
Dot product:
Algorithm :
The dot product operation is an O(n) operation.
The amount of work scales linearly with the dimension
n additions
n multiplications
Saxpy ( Vector update) Algorithm
Vector update :
Algorithm :
The vector update operation is also an O(n) operation.
The amount of work scales linearly with the dimension
Matrix-vector Multiplication
where y and x are vectors and A is a matrix
Algorithm:
This algorithm is called Row-oriented gaxpy.
This algorithm involves O(nm) work.
Matrix-vector Multiplication
where y and x are vectors and A is a matrix.
2nd Approach : Column Oriented Gaxpy
Example :
Algorithm : Interchanging the
order of the loop
Partitioning a Matrix
Consider a matrix A as a stack of row vectors
Row Partition Example:
A= so that
New algorithm for vector update
Partitioning a Matrix
Column Partition Example
A= so that
Algorithm for vector update with column partition
Count number of columns
Multiply a scalar with a vector
Colon Notation
kth row of matrix A:
kth column of matrix A:
Rewrite the algorithm with row partitioning
Outer Product Update
Update matrix A with multiplication of two vectors
Algorithm for outer product update:
The algorithm involves O(nm) operations.
Outer Product Update with Vector Mult.
Row update Column Update
Matrix – Matrix Multiplication
Dot Product
Linear Combination of left matrix columns
Sum of Outer Products
Matrix – Matrix Multiplication
Consider a matrix update by matrix multiplication
Triply Nested Loop Algorithm
This algorithm involves O(mnr) operations.
Dot Product Matrix Multiplication
Saxpy Formulation
Suppose matrix-A and C are column partitioned
Each column of C can be updated as follows :
Algorithm : Simplified Algorithm:
Flops
Flop is a measure of;
1. Addition
2. Subtraction
3. Multiplication
4. Division
Complex Matrices
Consider matrix A:
B is the real part of A;
C is the imaginary part of A
Matrix Operations:
Transposition of A becomes conjugate transposition.
Structure and Efficiency
Outline
1. Band Matrices
2. Triangular Matrices
3. Diagonal Matrices
4. Symmetric Matrices
5. Permutation Matrices
Band Matrices
Consider a matrix A;
Matrix A has lower bandwidth p if aij=0 when i > j + p
Matrix A has upper bandwidth q if aij=0 when j > i + q
Triangular Matrix Multiplication
Consider a matrix update by matrix multiplication
So the update takes the form of
k starts from i to j so we ignore the zero elements of [AB]
Colon Notation Triangular Matrix Mult.
Diagonal Matrices
Lower and upper bandwidth is Zero for all diagonal
matrices.
Notation :
Pre-multiplication of D scales rows of A
Post-multiplication of D scales columns of A
Symmetry
Identity and Permutation Matrix
Identity Matrix:
Permutation Matrix
Famous Permutation Matrices
Exchange Permutation
Downshift Permutation
Block Matrix Terminology
Special case of column and row partitioning.
Examples
Block Matrix Operations
Scalar Multiplication :
Transposition :
Addition:
Block Matrix Multiplication
Column dimensions of A
must match with
row dimensions of B.
Block Vector Update
Consider Block Matrix A and a block vector y
Block Vector update :
Algorithm :
Block Matrix Update
Consider a matrix update by matrix multiplication.
Algorithm:
Complex Matrix Multiplication
Consider matrix A, B, and C are complex matrices where
all matrices are real and i2= -1
Multiplication:
Note: Complex Matrix Multiplication has expanded
dimension.
Hamiltonian Matrix
Form :
1. A, F, and G are square matrices
2. F and G are symmetric matrices
Hamiltonian Check
1. Consider a permutation matrix J as
2. If then M is Hamiltonian Matrix.
Vector Processing
3-cycle Adder
Pipelined Addition for a vector operation
Data Motion in Computation Performance
Other than flop counts;
Data motion is also an important factor
when reasoning about performance.
Load/Store Operations
Vector Update ( Gaxpy) Outer Product Matrix Update
Load / Store Operations: (3+n) Load / Store Operations: (2+2n)
Memory Hierarchy
Each level has a limited capacity.
There is a cost associated with moving a data between two levels.
Efficient Implementation should consider data flow between the various levels.
Parallel Matrix Multiplication
Design of a parallel procedure begins with the breaking
up of the given problem into smaller parts that exhibit a
measure of INDEPENDENCE.
Consider block matrices A, B, and C
The task is to compute:
Data Motion in Parallel Multiplication
In a parallel computing environment, the data that a
processor needs can be “far away”, and if that is the case
too often, then it is possible to lose the multiprocessor
advantage.
Time spent waiting for another processor to finish a task
is TIME LOST.