Parallel Inversion of Polynomial Matrices
Alina Solovyova-Vincent
Frederick C. Harris, Jr.
M. Sami Fadali
Overview
IntroductionExisting algorithmsBusłowicz’s algorithmParallel algorithmResults Conclusions and future work
Definitions
A polynomial matrix is a matrix which has polynomials in all of its entries.
H(s) = Hnsn+Hn-1sn-1+Hn-2sn-2+…+Ho,
where Hi are constant r x r matrices,
i=0, …, n.
Definitions
Example: s+2 s3+ 3s2+s s3 s2+1
n=3 – degree of the polynomial matrix
r=2 – the size of the matrix H
Ho= H1= …2 0
0 1
1 1
0 0
Definitions
H-1(s) – inverse of the matrix H(s)
One of the ways to calculate it
H-1(s) = adj H(s) /det H(s)
Definitions
A rational matrix can be expressed as a ration of a numerator polynomial matrix and a denominator scalar polynomial.
Who Needs It???
Multivariable control systemsAnalysis of power systemsRobust stability analysisDesign of linear decoupling controllers… and many more areas.
Existing Algorithms
Leverrier’s algorithm ( 1840)[sI-H] - resolvent matrix
Exact algorithms Approximation methods
The Selection of the Algorithm
Before
Buslowicz’s algorithm (1980)
After
Large degree of polynomial operations
Lengthy calculationsNot very general
Some improvements at the cost of increased computational complexity
Buslowicz’s Algorithm
Benefits:More general than methods proposed earlierOnly requires operations on constant matricesSuitable for computer programming
Drawback: the irreducible form cannot be ensured in general
Details of the Algorithm
Available upon request
Challenges Encountered (sequential)
Several inconsistencies in the original paper:
Challenges Encountered (parallel)
for(k=0; k<n*i+1; k++) {
}
Dependent loops
for (i=2; i<r+1; i++) {
calculations requiring R[i-1][k]
}
O(n2r4)
Challenges Encountered (parallel)
Loops of variable length
for(k=0; k<n*i+1; k++) {
for(ll=0; ll<min+1; ll++) { main calculations } }
Varies with k
Shared and Distributed Memory
Main differences Synchronization of the processes
Shared Memory (barrier) Distributed memory (data exchange)
for (i=2; i<r+1; i++) { calculations requiring R[i-1]
*Synchronization point }
Platforms
Distributed memory platforms:
SGI 02 NOW MIPS R5000 180MHzP IV NOW 1.8 GHz P III Cluster 1GHz P IV Cluster Zeon 2.2GHz
Platforms
Shared memory platforms:
SGI Power Challenge 10000 8 MPIS R10000
SGI Origin 200016 MPIS R12000 300MHz
Understanding the Results
n – degree of polynomial (<= 25)r – size of a matrix (<=25)Sequential algorithm – O(n2r5)Average of multiple runsUnloaded platforms
Sequential Run Times (n=25, r=25)
Platform Times (sec)
SGI O2 NOW 2645.30
P IV NOW 22.94
P III Cluster 26.10
P IV Cluster 18.75
SGI Power Challenge 913.99
SGI Origin 2000 552.95
Results – Distributed Memory
Speedup
SGI O2 NOW - slowdown
P IV NOW - minimal speedup
Speedup (P III & P IV Clusters)
Results – Shared Memory
Excellent results!!!
Speedup (SGI Power Challenge)
Speedup (SGI Origin 2000)
Superlinear speedup!
Run times (SGI Power Challenge)
8 processors
Run times (SGI Origin 2000)
n =25
Run times (SGI Power Challenge)
r =20
Efficiency
2 4 6 8 16 24
P IIICluster
89.7% 76.5% 61.3% 58.5% 40.1% 25.0%
P IVCluster
88.3% 68.2% 49.9% 46.9% 26.1% 15.5%
SGI PowerChallenge
99.7% 98.2% 97.9% 95.8% n/a n/a
SGI Origin 2000
99.9% 98.7% 99.0% 98.2% 93.8% n/a
Conclusions
We have performed an exhaustive search of all available algorithms;We have implemented the sequential version of Busłowicz’s algorithm;We have implemented two versions of the parallel algorithm;We have tested parallel algorithm on 6 different platforms;We have obtained excellent speedup and efficiency in a shared memory environment.
Future Work
Study the behavior of the algorithm for larger problem sizes (distributed memory).
Re-evaluate message passing in distributed memory implementation.
Extend Buslowicz’s algorithm to inverting multivariable polynomial matrices
H(s1, s2 … sk).
Questions