+ All Categories
Home > Documents > Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series...

Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series...

Date post: 21-Mar-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
41
Matrix Multiplication Chapter I – Matrix Multiplication By Gokturk Poyrazoglu The State University of New York at Buffalo – BEST Group – Winter Lecture Series
Transcript
Page 1: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Matrix Multiplication

Chapter I – Matrix Multiplication

By Gokturk Poyrazoglu

The State University of New York at Buffalo – BEST Group – Winter Lecture Series

Page 2: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Outline

1. Basic Algorithms and Notation

2. Structure and Efficiency

3. Block Matrices

4. Matrix-Vector Products

5. Parallel Matrix Multiplication

Page 3: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Matrix Notation

Matrix Operations

Transposition :

Addition :

Scalar-matrix Multiplication :

Matrix-matrix Multiplication :

Pointwise Multiplication :

Pointwise Division :

Page 4: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Vector Notation

Column Vector :

Row Vector :

Page 5: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Vector Operations

Scalar-vector Multiplication :

Vector Addition :

Inner Product (dot product) :

Vector Update :

Pointwise Vector Multiplication :

Pointwise Vector Division :

Page 6: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Dot Product Algorithm

Dot product:

Algorithm :

The dot product operation is an O(n) operation.

The amount of work scales linearly with the dimension

n additions

n multiplications

Page 7: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Saxpy ( Vector update) Algorithm

Vector update :

Algorithm :

The vector update operation is also an O(n) operation.

The amount of work scales linearly with the dimension

Page 8: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Matrix-vector Multiplication

where y and x are vectors and A is a matrix

Algorithm:

This algorithm is called Row-oriented gaxpy.

This algorithm involves O(nm) work.

Page 9: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Matrix-vector Multiplication

where y and x are vectors and A is a matrix.

2nd Approach : Column Oriented Gaxpy

Example :

Algorithm : Interchanging the

order of the loop

Page 10: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Partitioning a Matrix

Consider a matrix A as a stack of row vectors

Row Partition Example:

A= so that

New algorithm for vector update

Page 11: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Partitioning a Matrix

Column Partition Example

A= so that

Algorithm for vector update with column partition

Count number of columns

Multiply a scalar with a vector

Page 12: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Colon Notation

kth row of matrix A:

kth column of matrix A:

Rewrite the algorithm with row partitioning

Page 13: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Outer Product Update

Update matrix A with multiplication of two vectors

Algorithm for outer product update:

The algorithm involves O(nm) operations.

Page 14: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Outer Product Update with Vector Mult.

Row update Column Update

Page 15: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Matrix – Matrix Multiplication

Dot Product

Linear Combination of left matrix columns

Sum of Outer Products

Page 16: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Matrix – Matrix Multiplication

Consider a matrix update by matrix multiplication

Triply Nested Loop Algorithm

This algorithm involves O(mnr) operations.

Page 17: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Dot Product Matrix Multiplication

Page 18: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Saxpy Formulation

Suppose matrix-A and C are column partitioned

Each column of C can be updated as follows :

Algorithm : Simplified Algorithm:

Page 19: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Flops

Flop is a measure of;

1. Addition

2. Subtraction

3. Multiplication

4. Division

Page 20: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Complex Matrices

Consider matrix A:

B is the real part of A;

C is the imaginary part of A

Matrix Operations:

Transposition of A becomes conjugate transposition.

Page 21: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Structure and Efficiency

Outline

1. Band Matrices

2. Triangular Matrices

3. Diagonal Matrices

4. Symmetric Matrices

5. Permutation Matrices

Page 22: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Band Matrices

Consider a matrix A;

Matrix A has lower bandwidth p if aij=0 when i > j + p

Matrix A has upper bandwidth q if aij=0 when j > i + q

Page 23: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Triangular Matrix Multiplication

Consider a matrix update by matrix multiplication

So the update takes the form of

k starts from i to j so we ignore the zero elements of [AB]

Page 24: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Colon Notation Triangular Matrix Mult.

Page 25: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Diagonal Matrices

Lower and upper bandwidth is Zero for all diagonal

matrices.

Notation :

Pre-multiplication of D scales rows of A

Post-multiplication of D scales columns of A

Page 26: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Symmetry

Page 27: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Identity and Permutation Matrix

Identity Matrix:

Permutation Matrix

Page 28: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Famous Permutation Matrices

Exchange Permutation

Downshift Permutation

Page 29: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Block Matrix Terminology

Special case of column and row partitioning.

Examples

Page 30: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Block Matrix Operations

Scalar Multiplication :

Transposition :

Addition:

Page 31: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Block Matrix Multiplication

Column dimensions of A

must match with

row dimensions of B.

Page 32: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Block Vector Update

Consider Block Matrix A and a block vector y

Block Vector update :

Algorithm :

Page 33: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Block Matrix Update

Consider a matrix update by matrix multiplication.

Algorithm:

Page 34: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Complex Matrix Multiplication

Consider matrix A, B, and C are complex matrices where

all matrices are real and i2= -1

Multiplication:

Note: Complex Matrix Multiplication has expanded

dimension.

Page 35: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Hamiltonian Matrix

Form :

1. A, F, and G are square matrices

2. F and G are symmetric matrices

Hamiltonian Check

1. Consider a permutation matrix J as

2. If then M is Hamiltonian Matrix.

Page 36: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Vector Processing

3-cycle Adder

Pipelined Addition for a vector operation

Page 37: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Data Motion in Computation Performance

Other than flop counts;

Data motion is also an important factor

when reasoning about performance.

Page 38: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Load/Store Operations

Vector Update ( Gaxpy) Outer Product Matrix Update

Load / Store Operations: (3+n) Load / Store Operations: (2+2n)

Page 39: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Memory Hierarchy

Each level has a limited capacity.

There is a cost associated with moving a data between two levels.

Efficient Implementation should consider data flow between the various levels.

Page 40: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Parallel Matrix Multiplication

Design of a parallel procedure begins with the breaking

up of the given problem into smaller parts that exhibit a

measure of INDEPENDENCE.

Consider block matrices A, B, and C

The task is to compute:

Page 41: Matrix Multiplication Part I - University at Buffalobest.eng.buffalo.edu/Research/Lecture Series 2013/Matrix Multiplication.pdf · Data Motion in Parallel Multiplication In a parallel

Data Motion in Parallel Multiplication

In a parallel computing environment, the data that a

processor needs can be “far away”, and if that is the case

too often, then it is possible to lose the multiprocessor

advantage.

Time spent waiting for another processor to finish a task

is TIME LOST.


Recommended